In this document, we create the queries and visualizations that drive our reporting of results.
This is the data we used to fit the models.
# read in data
model_df <- read_csv("model-data.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## worker_id = col_character(),
## condition = col_character(),
## start_means = col_logical(),
## gender = col_character(),
## age = col_character(),
## education = col_character(),
## chart_use = col_character(),
## strategy_with_means = col_character(),
## strategy_without_means = col_character(),
## outcome = col_logical(),
## means = col_logical(),
## exclude = col_logical()
## )
## See spec(...) for full column specifications.
# preprocessing
model_df <- model_df %>%
mutate(
# factors for modeling
means = as.factor(means),
start_means = as.factor(start_means),
sd_diff = as.factor(sd_diff),
condition = factor(condition, levels = c("densities","intervals", "HOPs", "QDPs")), # reorder
# evidence scale for decision model
p_diff = p_award_with - (p_award_without + (1 / award_value)),
evidence = qlogis(p_award_with) - qlogis(p_award_without + (1 / award_value))
)
We load in the model of probability of superiority judgments that we arrived at through a process of model expansion described in our preregistration[https://osf.io/9kpmb]. This is basically a hierachical linear model of probability of superiority judgments where both judgments and the ground truth have been transformed onto a log odds scale, making this a linear in log odds (LLO) model. See the paper and experiment/analysis/PSuperiority.Rmd in the supplemental materials for details.
# hierarchical linear log odds model
m.p_sup <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition*start_means + lo_ground_truth*condition*trial,
sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial + means*start_means),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_trial_block_sigma_gt_trial_means_block")
summary(m.p_sup)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition * start_means + lo_ground_truth * condition * trial
## sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial + means * start_means
## Data: model_df (Number of observations: 19892)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 622)
## Estimate Est.Error l-95% CI
## sd(Intercept) 0.06 0.01 0.05
## sd(lo_ground_truth) 0.39 0.01 0.37
## sd(trial) 0.03 0.01 0.00
## sd(meansTRUE) 0.03 0.01 0.02
## sd(sd_diff15) 0.08 0.01 0.07
## sd(lo_ground_truth:trial) 0.24 0.02 0.21
## sd(meansTRUE:sd_diff15) 0.06 0.01 0.04
## sd(sigma_Intercept) 1.18 0.03 1.12
## sd(sigma_lo_ground_truth) 0.41 0.01 0.38
## sd(sigma_trial) 1.18 0.04 1.11
## cor(Intercept,lo_ground_truth) -0.47 0.09 -0.64
## cor(Intercept,trial) 0.20 0.23 -0.30
## cor(lo_ground_truth,trial) -0.24 0.23 -0.64
## cor(Intercept,meansTRUE) 0.03 0.19 -0.32
## cor(lo_ground_truth,meansTRUE) -0.60 0.13 -0.81
## cor(trial,meansTRUE) 0.19 0.24 -0.33
## cor(Intercept,sd_diff15) -0.01 0.11 -0.22
## cor(lo_ground_truth,sd_diff15) 0.03 0.09 -0.15
## cor(trial,sd_diff15) 0.01 0.22 -0.44
## cor(meansTRUE,sd_diff15) 0.01 0.17 -0.33
## cor(Intercept,lo_ground_truth:trial) -0.28 0.10 -0.46
## cor(lo_ground_truth,lo_ground_truth:trial) 0.41 0.06 0.29
## cor(trial,lo_ground_truth:trial) -0.34 0.24 -0.71
## cor(meansTRUE,lo_ground_truth:trial) -0.14 0.16 -0.44
## cor(sd_diff15,lo_ground_truth:trial) 0.07 0.09 -0.10
## cor(Intercept,meansTRUE:sd_diff15) -0.33 0.13 -0.59
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.23 0.13 -0.04
## cor(trial,meansTRUE:sd_diff15) 0.16 0.23 -0.32
## cor(meansTRUE,meansTRUE:sd_diff15) 0.03 0.19 -0.33
## cor(sd_diff15,meansTRUE:sd_diff15) -0.30 0.12 -0.52
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) -0.12 0.12 -0.36
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.71 0.02 -0.75
## cor(sigma_Intercept,sigma_trial) 0.10 0.04 0.02
## cor(sigma_lo_ground_truth,sigma_trial) -0.05 0.04 -0.13
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.07 1.00 4599 6863
## sd(lo_ground_truth) 0.42 1.00 2957 6210
## sd(trial) 0.06 1.00 1202 2432
## sd(meansTRUE) 0.04 1.00 1557 2756
## sd(sd_diff15) 0.09 1.00 4960 7384
## sd(lo_ground_truth:trial) 0.27 1.00 2197 5227
## sd(meansTRUE:sd_diff15) 0.07 1.00 4550 7284
## sd(sigma_Intercept) 1.25 1.00 3965 6120
## sd(sigma_lo_ground_truth) 0.43 1.00 5222 7056
## sd(sigma_trial) 1.26 1.00 6867 8631
## cor(Intercept,lo_ground_truth) -0.28 1.00 465 1148
## cor(Intercept,trial) 0.60 1.00 6938 8874
## cor(lo_ground_truth,trial) 0.28 1.00 5378 6303
## cor(Intercept,meansTRUE) 0.41 1.00 2445 4981
## cor(lo_ground_truth,meansTRUE) -0.30 1.00 2912 6200
## cor(trial,meansTRUE) 0.62 1.00 2342 4018
## cor(Intercept,sd_diff15) 0.21 1.00 3239 5650
## cor(lo_ground_truth,sd_diff15) 0.20 1.00 4570 8400
## cor(trial,sd_diff15) 0.44 1.00 458 802
## cor(meansTRUE,sd_diff15) 0.32 1.00 836 1725
## cor(Intercept,lo_ground_truth:trial) -0.08 1.00 1236 3298
## cor(lo_ground_truth,lo_ground_truth:trial) 0.53 1.00 6874 8129
## cor(trial,lo_ground_truth:trial) 0.21 1.00 420 785
## cor(meansTRUE,lo_ground_truth:trial) 0.19 1.00 699 1635
## cor(sd_diff15,lo_ground_truth:trial) 0.23 1.00 3400 6099
## cor(Intercept,meansTRUE:sd_diff15) -0.06 1.00 4618 7518
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.48 1.00 5079 8054
## cor(trial,meansTRUE:sd_diff15) 0.58 1.00 1129 2019
## cor(meansTRUE,meansTRUE:sd_diff15) 0.41 1.00 2273 4918
## cor(sd_diff15,meansTRUE:sd_diff15) -0.05 1.00 4774 6954
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) 0.12 1.00 3833 8074
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.67 1.00 5963 8590
## cor(sigma_Intercept,sigma_trial) 0.17 1.00 6192 7482
## cor(sigma_lo_ground_truth,sigma_trial) 0.04 1.00 4770 6838
##
## Population-Level Effects:
## Estimate
## Intercept -0.02
## sigma_Intercept -1.72
## lo_ground_truth 0.46
## meansTRUE -0.01
## sd_diff15 0.04
## conditionHOPs -0.09
## conditionintervals -0.01
## conditionQDPs 0.02
## start_meansTRUE 0.01
## trial -0.05
## lo_ground_truth:meansTRUE -0.04
## lo_ground_truth:sd_diff15 0.08
## meansTRUE:sd_diff15 0.02
## lo_ground_truth:conditionHOPs -0.01
## lo_ground_truth:conditionintervals -0.10
## lo_ground_truth:conditionQDPs 0.07
## meansTRUE:conditionHOPs 0.09
## meansTRUE:conditionintervals 0.02
## meansTRUE:conditionQDPs -0.02
## sd_diff15:conditionHOPs 0.03
## sd_diff15:conditionintervals 0.02
## sd_diff15:conditionQDPs -0.01
## lo_ground_truth:start_meansTRUE -0.14
## meansTRUE:start_meansTRUE -0.01
## sd_diff15:start_meansTRUE 0.01
## conditionHOPs:start_meansTRUE 0.08
## conditionintervals:start_meansTRUE 0.00
## conditionQDPs:start_meansTRUE -0.01
## lo_ground_truth:trial 0.12
## conditionHOPs:trial 0.01
## conditionintervals:trial 0.03
## conditionQDPs:trial 0.05
## lo_ground_truth:meansTRUE:sd_diff15 0.04
## lo_ground_truth:meansTRUE:conditionHOPs -0.08
## lo_ground_truth:meansTRUE:conditionintervals -0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.05
## lo_ground_truth:sd_diff15:conditionintervals -0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.02
## meansTRUE:sd_diff15:conditionHOPs -0.01
## meansTRUE:sd_diff15:conditionintervals -0.02
## meansTRUE:sd_diff15:conditionQDPs -0.00
## lo_ground_truth:meansTRUE:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:start_meansTRUE 0.02
## meansTRUE:sd_diff15:start_meansTRUE -0.02
## lo_ground_truth:conditionHOPs:start_meansTRUE -0.07
## lo_ground_truth:conditionintervals:start_meansTRUE 0.04
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.14
## meansTRUE:conditionHOPs:start_meansTRUE -0.09
## meansTRUE:conditionintervals:start_meansTRUE 0.01
## meansTRUE:conditionQDPs:start_meansTRUE 0.02
## sd_diff15:conditionHOPs:start_meansTRUE -0.02
## sd_diff15:conditionintervals:start_meansTRUE -0.01
## sd_diff15:conditionQDPs:start_meansTRUE -0.02
## lo_ground_truth:conditionHOPs:trial -0.02
## lo_ground_truth:conditionintervals:trial 0.01
## lo_ground_truth:conditionQDPs:trial 0.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.12
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE -0.01
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.02
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.01
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE -0.01
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.03
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.01
## sigma_lo_ground_truth 0.46
## sigma_conditionHOPs 0.59
## sigma_conditionintervals 0.17
## sigma_conditionQDPs -0.05
## sigma_trial -0.46
## sigma_meansTRUE -0.00
## sigma_start_meansTRUE -0.04
## sigma_lo_ground_truth:conditionHOPs -0.18
## sigma_lo_ground_truth:conditionintervals -0.10
## sigma_lo_ground_truth:conditionQDPs -0.03
## sigma_lo_ground_truth:trial 0.02
## sigma_conditionHOPs:trial 0.08
## sigma_conditionintervals:trial 0.14
## sigma_conditionQDPs:trial -0.03
## sigma_meansTRUE:start_meansTRUE -0.22
## sigma_lo_ground_truth:conditionHOPs:trial 0.04
## sigma_lo_ground_truth:conditionintervals:trial 0.06
## sigma_lo_ground_truth:conditionQDPs:trial -0.02
## Est.Error
## Intercept 0.02
## sigma_Intercept 0.09
## lo_ground_truth 0.04
## meansTRUE 0.02
## sd_diff15 0.02
## conditionHOPs 0.03
## conditionintervals 0.02
## conditionQDPs 0.02
## start_meansTRUE 0.02
## trial 0.02
## lo_ground_truth:meansTRUE 0.02
## lo_ground_truth:sd_diff15 0.02
## meansTRUE:sd_diff15 0.02
## lo_ground_truth:conditionHOPs 0.07
## lo_ground_truth:conditionintervals 0.06
## lo_ground_truth:conditionQDPs 0.06
## meansTRUE:conditionHOPs 0.03
## meansTRUE:conditionintervals 0.02
## meansTRUE:conditionQDPs 0.03
## sd_diff15:conditionHOPs 0.04
## sd_diff15:conditionintervals 0.03
## sd_diff15:conditionQDPs 0.03
## lo_ground_truth:start_meansTRUE 0.06
## meansTRUE:start_meansTRUE 0.03
## sd_diff15:start_meansTRUE 0.03
## conditionHOPs:start_meansTRUE 0.04
## conditionintervals:start_meansTRUE 0.03
## conditionQDPs:start_meansTRUE 0.03
## lo_ground_truth:trial 0.03
## conditionHOPs:trial 0.04
## conditionintervals:trial 0.03
## conditionQDPs:trial 0.03
## lo_ground_truth:meansTRUE:sd_diff15 0.02
## lo_ground_truth:meansTRUE:conditionHOPs 0.04
## lo_ground_truth:meansTRUE:conditionintervals 0.03
## lo_ground_truth:meansTRUE:conditionQDPs 0.03
## lo_ground_truth:sd_diff15:conditionHOPs 0.03
## lo_ground_truth:sd_diff15:conditionintervals 0.02
## lo_ground_truth:sd_diff15:conditionQDPs 0.03
## meansTRUE:sd_diff15:conditionHOPs 0.04
## meansTRUE:sd_diff15:conditionintervals 0.03
## meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:start_meansTRUE 0.03
## lo_ground_truth:sd_diff15:start_meansTRUE 0.02
## meansTRUE:sd_diff15:start_meansTRUE 0.03
## lo_ground_truth:conditionHOPs:start_meansTRUE 0.09
## lo_ground_truth:conditionintervals:start_meansTRUE 0.09
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.09
## meansTRUE:conditionHOPs:start_meansTRUE 0.05
## meansTRUE:conditionintervals:start_meansTRUE 0.04
## meansTRUE:conditionQDPs:start_meansTRUE 0.04
## sd_diff15:conditionHOPs:start_meansTRUE 0.05
## sd_diff15:conditionintervals:start_meansTRUE 0.04
## sd_diff15:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:conditionHOPs:trial 0.05
## lo_ground_truth:conditionintervals:trial 0.04
## lo_ground_truth:conditionQDPs:trial 0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.05
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 0.03
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.04
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.04
## sigma_lo_ground_truth 0.03
## sigma_conditionHOPs 0.12
## sigma_conditionintervals 0.12
## sigma_conditionQDPs 0.12
## sigma_trial 0.10
## sigma_meansTRUE 0.03
## sigma_start_meansTRUE 0.07
## sigma_lo_ground_truth:conditionHOPs 0.05
## sigma_lo_ground_truth:conditionintervals 0.05
## sigma_lo_ground_truth:conditionQDPs 0.05
## sigma_lo_ground_truth:trial 0.05
## sigma_conditionHOPs:trial 0.14
## sigma_conditionintervals:trial 0.14
## sigma_conditionQDPs:trial 0.14
## sigma_meansTRUE:start_meansTRUE 0.05
## sigma_lo_ground_truth:conditionHOPs:trial 0.07
## sigma_lo_ground_truth:conditionintervals:trial 0.07
## sigma_lo_ground_truth:conditionQDPs:trial 0.07
## l-95% CI
## Intercept -0.05
## sigma_Intercept -1.90
## lo_ground_truth 0.37
## meansTRUE -0.04
## sd_diff15 -0.00
## conditionHOPs -0.14
## conditionintervals -0.05
## conditionQDPs -0.02
## start_meansTRUE -0.03
## trial -0.10
## lo_ground_truth:meansTRUE -0.08
## lo_ground_truth:sd_diff15 0.04
## meansTRUE:sd_diff15 -0.02
## lo_ground_truth:conditionHOPs -0.14
## lo_ground_truth:conditionintervals -0.23
## lo_ground_truth:conditionQDPs -0.06
## meansTRUE:conditionHOPs 0.02
## meansTRUE:conditionintervals -0.03
## meansTRUE:conditionQDPs -0.07
## sd_diff15:conditionHOPs -0.04
## sd_diff15:conditionintervals -0.03
## sd_diff15:conditionQDPs -0.06
## lo_ground_truth:start_meansTRUE -0.26
## meansTRUE:start_meansTRUE -0.07
## sd_diff15:start_meansTRUE -0.04
## conditionHOPs:start_meansTRUE 0.01
## conditionintervals:start_meansTRUE -0.05
## conditionQDPs:start_meansTRUE -0.07
## lo_ground_truth:trial 0.06
## conditionHOPs:trial -0.07
## conditionintervals:trial -0.02
## conditionQDPs:trial -0.01
## lo_ground_truth:meansTRUE:sd_diff15 -0.01
## lo_ground_truth:meansTRUE:conditionHOPs -0.15
## lo_ground_truth:meansTRUE:conditionintervals -0.07
## lo_ground_truth:meansTRUE:conditionQDPs -0.06
## lo_ground_truth:sd_diff15:conditionHOPs -0.01
## lo_ground_truth:sd_diff15:conditionintervals -0.06
## lo_ground_truth:sd_diff15:conditionQDPs -0.03
## meansTRUE:sd_diff15:conditionHOPs -0.09
## meansTRUE:sd_diff15:conditionintervals -0.08
## meansTRUE:sd_diff15:conditionQDPs -0.06
## lo_ground_truth:meansTRUE:start_meansTRUE -0.02
## lo_ground_truth:sd_diff15:start_meansTRUE -0.02
## meansTRUE:sd_diff15:start_meansTRUE -0.07
## lo_ground_truth:conditionHOPs:start_meansTRUE -0.25
## lo_ground_truth:conditionintervals:start_meansTRUE -0.13
## lo_ground_truth:conditionQDPs:start_meansTRUE -0.03
## meansTRUE:conditionHOPs:start_meansTRUE -0.19
## meansTRUE:conditionintervals:start_meansTRUE -0.07
## meansTRUE:conditionQDPs:start_meansTRUE -0.06
## sd_diff15:conditionHOPs:start_meansTRUE -0.11
## sd_diff15:conditionintervals:start_meansTRUE -0.08
## sd_diff15:conditionQDPs:start_meansTRUE -0.09
## lo_ground_truth:conditionHOPs:trial -0.12
## lo_ground_truth:conditionintervals:trial -0.08
## lo_ground_truth:conditionQDPs:trial -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.09
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE -0.01
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.02
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE -0.05
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE -0.08
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE -0.05
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE -0.04
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE -0.07
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.06
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.05
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.06
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.17
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.12
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.08
## sigma_lo_ground_truth 0.39
## sigma_conditionHOPs 0.36
## sigma_conditionintervals -0.07
## sigma_conditionQDPs -0.29
## sigma_trial -0.67
## sigma_meansTRUE -0.06
## sigma_start_meansTRUE -0.18
## sigma_lo_ground_truth:conditionHOPs -0.27
## sigma_lo_ground_truth:conditionintervals -0.20
## sigma_lo_ground_truth:conditionQDPs -0.12
## sigma_lo_ground_truth:trial -0.07
## sigma_conditionHOPs:trial -0.19
## sigma_conditionintervals:trial -0.14
## sigma_conditionQDPs:trial -0.31
## sigma_meansTRUE:start_meansTRUE -0.32
## sigma_lo_ground_truth:conditionHOPs:trial -0.09
## sigma_lo_ground_truth:conditionintervals:trial -0.08
## sigma_lo_ground_truth:conditionQDPs:trial -0.16
## u-95% CI
## Intercept 0.01
## sigma_Intercept -1.54
## lo_ground_truth 0.54
## meansTRUE 0.03
## sd_diff15 0.08
## conditionHOPs -0.03
## conditionintervals 0.03
## conditionQDPs 0.06
## start_meansTRUE 0.05
## trial -0.01
## lo_ground_truth:meansTRUE -0.00
## lo_ground_truth:sd_diff15 0.12
## meansTRUE:sd_diff15 0.06
## lo_ground_truth:conditionHOPs 0.11
## lo_ground_truth:conditionintervals 0.02
## lo_ground_truth:conditionQDPs 0.19
## meansTRUE:conditionHOPs 0.15
## meansTRUE:conditionintervals 0.06
## meansTRUE:conditionQDPs 0.03
## sd_diff15:conditionHOPs 0.10
## sd_diff15:conditionintervals 0.07
## sd_diff15:conditionQDPs 0.05
## lo_ground_truth:start_meansTRUE -0.02
## meansTRUE:start_meansTRUE 0.04
## sd_diff15:start_meansTRUE 0.06
## conditionHOPs:start_meansTRUE 0.15
## conditionintervals:start_meansTRUE 0.06
## conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:trial 0.18
## conditionHOPs:trial 0.09
## conditionintervals:trial 0.09
## conditionQDPs:trial 0.10
## lo_ground_truth:meansTRUE:sd_diff15 0.09
## lo_ground_truth:meansTRUE:conditionHOPs -0.01
## lo_ground_truth:meansTRUE:conditionintervals 0.04
## lo_ground_truth:meansTRUE:conditionQDPs 0.05
## lo_ground_truth:sd_diff15:conditionHOPs 0.11
## lo_ground_truth:sd_diff15:conditionintervals 0.03
## lo_ground_truth:sd_diff15:conditionQDPs 0.07
## meansTRUE:sd_diff15:conditionHOPs 0.07
## meansTRUE:sd_diff15:conditionintervals 0.04
## meansTRUE:sd_diff15:conditionQDPs 0.06
## lo_ground_truth:meansTRUE:start_meansTRUE 0.09
## lo_ground_truth:sd_diff15:start_meansTRUE 0.06
## meansTRUE:sd_diff15:start_meansTRUE 0.04
## lo_ground_truth:conditionHOPs:start_meansTRUE 0.11
## lo_ground_truth:conditionintervals:start_meansTRUE 0.21
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.31
## meansTRUE:conditionHOPs:start_meansTRUE 0.01
## meansTRUE:conditionintervals:start_meansTRUE 0.08
## meansTRUE:conditionQDPs:start_meansTRUE 0.09
## sd_diff15:conditionHOPs:start_meansTRUE 0.07
## sd_diff15:conditionintervals:start_meansTRUE 0.06
## sd_diff15:conditionQDPs:start_meansTRUE 0.05
## lo_ground_truth:conditionHOPs:trial 0.08
## lo_ground_truth:conditionintervals:trial 0.09
## lo_ground_truth:conditionQDPs:trial 0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 0.06
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.10
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.22
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.11
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 0.07
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.09
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.06
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 0.05
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.15
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.10
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.10
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.07
## sigma_lo_ground_truth 0.52
## sigma_conditionHOPs 0.83
## sigma_conditionintervals 0.41
## sigma_conditionQDPs 0.20
## sigma_trial -0.26
## sigma_meansTRUE 0.06
## sigma_start_meansTRUE 0.09
## sigma_lo_ground_truth:conditionHOPs -0.09
## sigma_lo_ground_truth:conditionintervals -0.01
## sigma_lo_ground_truth:conditionQDPs 0.06
## sigma_lo_ground_truth:trial 0.12
## sigma_conditionHOPs:trial 0.37
## sigma_conditionintervals:trial 0.42
## sigma_conditionQDPs:trial 0.24
## sigma_meansTRUE:start_meansTRUE -0.13
## sigma_lo_ground_truth:conditionHOPs:trial 0.18
## sigma_lo_ground_truth:conditionintervals:trial 0.20
## sigma_lo_ground_truth:conditionQDPs:trial 0.11
## Rhat
## Intercept 1.00
## sigma_Intercept 1.00
## lo_ground_truth 1.00
## meansTRUE 1.00
## sd_diff15 1.00
## conditionHOPs 1.00
## conditionintervals 1.00
## conditionQDPs 1.00
## start_meansTRUE 1.00
## trial 1.00
## lo_ground_truth:meansTRUE 1.00
## lo_ground_truth:sd_diff15 1.00
## meansTRUE:sd_diff15 1.00
## lo_ground_truth:conditionHOPs 1.00
## lo_ground_truth:conditionintervals 1.00
## lo_ground_truth:conditionQDPs 1.00
## meansTRUE:conditionHOPs 1.00
## meansTRUE:conditionintervals 1.00
## meansTRUE:conditionQDPs 1.00
## sd_diff15:conditionHOPs 1.00
## sd_diff15:conditionintervals 1.00
## sd_diff15:conditionQDPs 1.00
## lo_ground_truth:start_meansTRUE 1.00
## meansTRUE:start_meansTRUE 1.00
## sd_diff15:start_meansTRUE 1.00
## conditionHOPs:start_meansTRUE 1.00
## conditionintervals:start_meansTRUE 1.00
## conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:trial 1.00
## conditionHOPs:trial 1.00
## conditionintervals:trial 1.00
## conditionQDPs:trial 1.00
## lo_ground_truth:meansTRUE:sd_diff15 1.00
## lo_ground_truth:meansTRUE:conditionHOPs 1.00
## lo_ground_truth:meansTRUE:conditionintervals 1.00
## lo_ground_truth:meansTRUE:conditionQDPs 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 1.00
## lo_ground_truth:sd_diff15:conditionintervals 1.00
## lo_ground_truth:sd_diff15:conditionQDPs 1.00
## meansTRUE:sd_diff15:conditionHOPs 1.00
## meansTRUE:sd_diff15:conditionintervals 1.00
## meansTRUE:sd_diff15:conditionQDPs 1.00
## lo_ground_truth:meansTRUE:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:start_meansTRUE 1.00
## meansTRUE:sd_diff15:start_meansTRUE 1.00
## lo_ground_truth:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:conditionQDPs:start_meansTRUE 1.00
## meansTRUE:conditionHOPs:start_meansTRUE 1.00
## meansTRUE:conditionintervals:start_meansTRUE 1.00
## meansTRUE:conditionQDPs:start_meansTRUE 1.00
## sd_diff15:conditionHOPs:start_meansTRUE 1.00
## sd_diff15:conditionintervals:start_meansTRUE 1.00
## sd_diff15:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:conditionHOPs:trial 1.00
## lo_ground_truth:conditionintervals:trial 1.00
## lo_ground_truth:conditionQDPs:trial 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 1.00
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## sigma_lo_ground_truth 1.00
## sigma_conditionHOPs 1.00
## sigma_conditionintervals 1.00
## sigma_conditionQDPs 1.00
## sigma_trial 1.00
## sigma_meansTRUE 1.00
## sigma_start_meansTRUE 1.00
## sigma_lo_ground_truth:conditionHOPs 1.00
## sigma_lo_ground_truth:conditionintervals 1.00
## sigma_lo_ground_truth:conditionQDPs 1.00
## sigma_lo_ground_truth:trial 1.00
## sigma_conditionHOPs:trial 1.00
## sigma_conditionintervals:trial 1.00
## sigma_conditionQDPs:trial 1.00
## sigma_meansTRUE:start_meansTRUE 1.00
## sigma_lo_ground_truth:conditionHOPs:trial 1.00
## sigma_lo_ground_truth:conditionintervals:trial 1.00
## sigma_lo_ground_truth:conditionQDPs:trial 1.00
## Bulk_ESS
## Intercept 3924
## sigma_Intercept 2585
## lo_ground_truth 4594
## meansTRUE 3430
## sd_diff15 4192
## conditionHOPs 4624
## conditionintervals 4464
## conditionQDPs 3974
## start_meansTRUE 3831
## trial 4666
## lo_ground_truth:meansTRUE 3466
## lo_ground_truth:sd_diff15 3578
## meansTRUE:sd_diff15 3710
## lo_ground_truth:conditionHOPs 5234
## lo_ground_truth:conditionintervals 5029
## lo_ground_truth:conditionQDPs 4672
## meansTRUE:conditionHOPs 4248
## meansTRUE:conditionintervals 3939
## meansTRUE:conditionQDPs 3901
## sd_diff15:conditionHOPs 4952
## sd_diff15:conditionintervals 4834
## sd_diff15:conditionQDPs 4946
## lo_ground_truth:start_meansTRUE 4691
## meansTRUE:start_meansTRUE 3474
## sd_diff15:start_meansTRUE 4225
## conditionHOPs:start_meansTRUE 4886
## conditionintervals:start_meansTRUE 4380
## conditionQDPs:start_meansTRUE 4003
## lo_ground_truth:trial 5324
## conditionHOPs:trial 5802
## conditionintervals:trial 4910
## conditionQDPs:trial 5188
## lo_ground_truth:meansTRUE:sd_diff15 3618
## lo_ground_truth:meansTRUE:conditionHOPs 4084
## lo_ground_truth:meansTRUE:conditionintervals 3972
## lo_ground_truth:meansTRUE:conditionQDPs 4126
## lo_ground_truth:sd_diff15:conditionHOPs 4734
## lo_ground_truth:sd_diff15:conditionintervals 4104
## lo_ground_truth:sd_diff15:conditionQDPs 3918
## meansTRUE:sd_diff15:conditionHOPs 4782
## meansTRUE:sd_diff15:conditionintervals 4410
## meansTRUE:sd_diff15:conditionQDPs 4249
## lo_ground_truth:meansTRUE:start_meansTRUE 3451
## lo_ground_truth:sd_diff15:start_meansTRUE 3728
## meansTRUE:sd_diff15:start_meansTRUE 3787
## lo_ground_truth:conditionHOPs:start_meansTRUE 5402
## lo_ground_truth:conditionintervals:start_meansTRUE 5033
## lo_ground_truth:conditionQDPs:start_meansTRUE 4788
## meansTRUE:conditionHOPs:start_meansTRUE 4442
## meansTRUE:conditionintervals:start_meansTRUE 3838
## meansTRUE:conditionQDPs:start_meansTRUE 3889
## sd_diff15:conditionHOPs:start_meansTRUE 5159
## sd_diff15:conditionintervals:start_meansTRUE 5152
## sd_diff15:conditionQDPs:start_meansTRUE 5108
## lo_ground_truth:conditionHOPs:trial 6163
## lo_ground_truth:conditionintervals:trial 5776
## lo_ground_truth:conditionQDPs:trial 5577
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 4466
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 4015
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 4034
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 3803
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 4153
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 3911
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 4041
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 5020
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 4486
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 4676
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 5278
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 4595
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 4353
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 4850
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 4310
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 4213
## sigma_lo_ground_truth 3727
## sigma_conditionHOPs 2626
## sigma_conditionintervals 2458
## sigma_conditionQDPs 2225
## sigma_trial 6609
## sigma_meansTRUE 9256
## sigma_start_meansTRUE 3124
## sigma_lo_ground_truth:conditionHOPs 4008
## sigma_lo_ground_truth:conditionintervals 3311
## sigma_lo_ground_truth:conditionQDPs 3568
## sigma_lo_ground_truth:trial 8234
## sigma_conditionHOPs:trial 6675
## sigma_conditionintervals:trial 6981
## sigma_conditionQDPs:trial 7431
## sigma_meansTRUE:start_meansTRUE 9057
## sigma_lo_ground_truth:conditionHOPs:trial 8181
## sigma_lo_ground_truth:conditionintervals:trial 8377
## sigma_lo_ground_truth:conditionQDPs:trial 8103
## Tail_ESS
## Intercept 6057
## sigma_Intercept 4770
## lo_ground_truth 6984
## meansTRUE 5961
## sd_diff15 6348
## conditionHOPs 7735
## conditionintervals 6604
## conditionQDPs 6359
## start_meansTRUE 5965
## trial 7078
## lo_ground_truth:meansTRUE 6159
## lo_ground_truth:sd_diff15 5348
## meansTRUE:sd_diff15 6548
## lo_ground_truth:conditionHOPs 7679
## lo_ground_truth:conditionintervals 7084
## lo_ground_truth:conditionQDPs 7120
## meansTRUE:conditionHOPs 7366
## meansTRUE:conditionintervals 6680
## meansTRUE:conditionQDPs 6540
## sd_diff15:conditionHOPs 7733
## sd_diff15:conditionintervals 7238
## sd_diff15:conditionQDPs 6504
## lo_ground_truth:start_meansTRUE 6881
## meansTRUE:start_meansTRUE 5693
## sd_diff15:start_meansTRUE 7179
## conditionHOPs:start_meansTRUE 7785
## conditionintervals:start_meansTRUE 6984
## conditionQDPs:start_meansTRUE 6740
## lo_ground_truth:trial 7303
## conditionHOPs:trial 8127
## conditionintervals:trial 7612
## conditionQDPs:trial 7242
## lo_ground_truth:meansTRUE:sd_diff15 5876
## lo_ground_truth:meansTRUE:conditionHOPs 6653
## lo_ground_truth:meansTRUE:conditionintervals 6798
## lo_ground_truth:meansTRUE:conditionQDPs 6750
## lo_ground_truth:sd_diff15:conditionHOPs 7731
## lo_ground_truth:sd_diff15:conditionintervals 6018
## lo_ground_truth:sd_diff15:conditionQDPs 6737
## meansTRUE:sd_diff15:conditionHOPs 7564
## meansTRUE:sd_diff15:conditionintervals 7271
## meansTRUE:sd_diff15:conditionQDPs 6517
## lo_ground_truth:meansTRUE:start_meansTRUE 5677
## lo_ground_truth:sd_diff15:start_meansTRUE 6491
## meansTRUE:sd_diff15:start_meansTRUE 6134
## lo_ground_truth:conditionHOPs:start_meansTRUE 7492
## lo_ground_truth:conditionintervals:start_meansTRUE 7325
## lo_ground_truth:conditionQDPs:start_meansTRUE 7088
## meansTRUE:conditionHOPs:start_meansTRUE 7153
## meansTRUE:conditionintervals:start_meansTRUE 5911
## meansTRUE:conditionQDPs:start_meansTRUE 5949
## sd_diff15:conditionHOPs:start_meansTRUE 7609
## sd_diff15:conditionintervals:start_meansTRUE 7213
## sd_diff15:conditionQDPs:start_meansTRUE 7802
## lo_ground_truth:conditionHOPs:trial 8303
## lo_ground_truth:conditionintervals:trial 7710
## lo_ground_truth:conditionQDPs:trial 6998
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 7493
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 6492
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 6440
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 6349
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 6671
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 6719
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 6116
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 7744
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 7303
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 6843
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 7007
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 7513
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 7164
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 6958
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 6855
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 6977
## sigma_lo_ground_truth 6198
## sigma_conditionHOPs 4968
## sigma_conditionintervals 4718
## sigma_conditionQDPs 4207
## sigma_trial 8221
## sigma_meansTRUE 8800
## sigma_start_meansTRUE 5883
## sigma_lo_ground_truth:conditionHOPs 7173
## sigma_lo_ground_truth:conditionintervals 6236
## sigma_lo_ground_truth:conditionQDPs 6196
## sigma_lo_ground_truth:trial 8697
## sigma_conditionHOPs:trial 8350
## sigma_conditionintervals:trial 8367
## sigma_conditionQDPs:trial 9156
## sigma_meansTRUE:start_meansTRUE 8632
## sigma_lo_ground_truth:conditionHOPs:trial 8474
## sigma_lo_ground_truth:conditionintervals:trial 8765
## sigma_lo_ground_truth:conditionQDPs:trial 8967
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The primary results about probability of superiority that we present in the paper concern the three way interaction between the ground truth probability of superiority, the presence or absence of extrinsic means, and the level of variance shown lo_ground_truth*means*sd_diff for each uncertainty visualization format we tested. In order to show this effect, we want to show how the slope of the linear in log odds (LLO) model, changes as a function of extrinsic means, variance show, and visualization format. The charts below highlight this effect.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out other predictors by taking a weighted average
ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(
title = "Slopes in Linear Log Odds Model",
x = "Slope",
y = "Visualization",
fill = "Means Present"
) +
theme_minimal() +
facet_grid(sd_diff ~ .)
We’ll break this chart down into contrasts and contrasts of contrasts to do some visual reliability testing.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
ggplot(aes(x = slope, y = condition)) +
stat_halfeyeh() +
labs(
title = "Effect of Means on LLO Slopes",
x = "Slope Difference (Means present - absent)",
y = "Visualization"
) +
theme_minimal() +
facet_grid(sd_diff ~ .)
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
compare_levels(slope, by = sd_diff) %>% # contrast sd_diff high - low (I think)
ggplot(aes(x = slope, y = condition)) +
stat_halfeyeh() +
labs(
title = "Effect of Variance on the Effect of Extrinsic Means",
x = "Difference in Slope Differences (Effect of means at high - low uncertainty)",
y = "Visualization"
) +
theme_minimal()
It looks like extrinsic means lead to greater underestimation of probability of superiority (lower LLO slopes) when variance is low, regardless of visualization condition. This is the effect we expected to see. Surprisingly, the impact of extrinsic means does not seem to depend on the intinsic salience of the mean in the uncertainty visualization conditions. At high levels of variance, extrinsic means improve slopes for intervals and densities but still reduce slopes for HOPs.
Effect of means on slopes for each combination of visualization condition and level of variance (in figure).
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, condition, sd_diff, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
mean_qi()
## # A tibble: 8 x 9
## # Groups: means, condition [4]
## means condition sd_diff slope .lower .upper .width .point .interval
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE - FA… densities 5 -0.0250 -0.0479 -0.00232 0.95 mean qi
## 2 TRUE - FA… densities 15 0.0385 0.0136 0.0633 0.95 mean qi
## 3 TRUE - FA… intervals 5 -0.0230 -0.0430 -0.00350 0.95 mean qi
## 4 TRUE - FA… intervals 15 0.0446 0.0245 0.0641 0.95 mean qi
## 5 TRUE - FA… HOPs 5 -0.0439 -0.0746 -0.0135 0.95 mean qi
## 6 TRUE - FA… HOPs 15 -0.0376 -0.0713 -0.00404 0.95 mean qi
## 7 TRUE - FA… QDPs 5 -0.0337 -0.0551 -0.0127 0.95 mean qi
## 8 TRUE - FA… QDPs 15 -0.00554 -0.0290 0.0180 0.95 mean qi
Effect of adding means on predicted error for each combination of visualization condition and level of variance. This helps us contextualize the impact of adding means.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
compare_levels(est_error, by = means) %>% # contrast mean present - absent
group_by(means, condition, sd_diff) %>% # group by predictors to keep
mean_qi(est_error)
## # A tibble: 8 x 9
## # Groups: means, condition [4]
## means condition sd_diff est_error .lower .upper .width .point .interval
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE - FALSE densities 5 -0.00942 -0.236 0.216 0.95 mean qi
## 2 TRUE - FALSE densities 15 0.0102 -0.188 0.222 0.95 mean qi
## 3 TRUE - FALSE intervals 5 -0.00476 -0.236 0.222 0.95 mean qi
## 4 TRUE - FALSE intervals 15 0.0150 -0.191 0.231 0.95 mean qi
## 5 TRUE - FALSE HOPs 5 -0.00507 -0.304 0.289 0.95 mean qi
## 6 TRUE - FALSE HOPs 15 0.00360 -0.266 0.275 0.95 mean qi
## 7 TRUE - FALSE QDPs 5 -0.0128 -0.201 0.172 0.95 mean qi
## 8 TRUE - FALSE QDPs 15 -0.00133 -0.170 0.168 0.95 mean qi
Effect of means on slopes, marginalizing across visualization condition (in figure).
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
mean_qi()
## # A tibble: 2 x 8
## # Groups: means [1]
## means sd_diff slope .lower .upper .width .point .interval
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE - FALSE 5 -0.0314 -0.0437 -0.0195 0.95 mean qi
## 2 TRUE - FALSE 15 0.00999 -0.00300 0.0229 0.95 mean qi
Effect of adding means on predicted error, marginalizing across visualization conditions. This helps us contextualize the impact of adding means.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
compare_levels(est_error, by = means) %>% # contrast mean present - absent
group_by(means, sd_diff) %>% # group by predictors to keep
mean_qi(est_error)
## # A tibble: 2 x 8
## # Groups: means [1]
## means sd_diff est_error .lower .upper .width .point .interval
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE - FALSE 5 -0.00802 -0.250 0.234 0.95 mean qi
## 2 TRUE - FALSE 15 0.00689 -0.211 0.231 0.95 mean qi
We preregistered comparisons of LLO slopes in each uncertainty visualization condition, marginalizing across other predictors. However, it occurred to us later that these effects are not that useful for making design recommendatations. They represent uncertainty encodings that cannot be rendered: distributions which both do and do not have means added at the same time. This is a statistical abstraction that represents the effectiveness of uncertainty encodings averaging across other maniputlations. As such we present it here but omit comparisons averaging across the presence/absence of the mean from the paper.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope, y = condition, fill = condition)) +
stat_slabh(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
labs(subtitle = "Slopes Per Visualization Condition") +
theme_minimal() +
theme(legend.position = "none")
Let’s look at contrasts between visualization conditions to get a sense of which differences are reliable.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
compare_levels(slope, by = condition) %>%
# compare_levels(slope, by = condition, comparison = list(c("QDPs", "intervals"), c("QDPs", "HOPs"), c("QDPs", "densities"), c("densities", "intervals"))) %>% # show only reliable contrasts
ggplot(aes(x = slope, y = condition)) +
stat_halfeyeh() +
labs(x = "Slope Differences Between Visualization Conditions") +
theme_minimal()
The chart above shows only the contrasts between quantile dotplots and each other conditions are reliable.
Slope estimates per visualization condition.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>%
mean_qi()
## # A tibble: 4 x 7
## condition slope .lower .upper .width .point .interval
## <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 densities 0.436 0.372 0.499 0.95 mean qi
## 2 intervals 0.350 0.289 0.413 0.95 mean qi
## 3 HOPs 0.394 0.329 0.460 0.95 mean qi
## 4 QDPs 0.566 0.503 0.631 0.95 mean qi
Predicted error per visualization condition.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
group_by(condition) %>% # group by predictors to keep
mean_qi(est_error)
## # A tibble: 4 x 7
## condition est_error .lower .upper .width .point .interval
## <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 densities -0.124 -0.362 0.0287 0.95 mean qi
## 2 intervals -0.146 -0.394 0.0295 0.95 mean qi
## 3 HOPs -0.140 -0.426 0.0689 0.95 mean qi
## 4 QDPs -0.0891 -0.270 0.0353 0.95 mean qi
Instead of the marginal effects of visualization conditions shown above, what we present in the paper are the effects of each visualization design (uncertainty encoding x means). This means that we are only marginalizing across levels of variance.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, means, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize by taking a weighted average
ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(subtitle = "Slopes Per Visualization Design") +
theme_minimal() +
theme(legend.position = "none")
Let’s look at contrasts between visualization designs to get a sense of which differences are reliable.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
unite("design", c(condition, means)) %>%
group_by(design, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize by taking a weighted average
compare_levels(slope, by = design) %>%
ggplot(aes(x = slope, y = design)) +
stat_halfeyeh() +
labs(x = "Slope Differences Between Visualization Designs") +
theme_minimal()
Quantile dotplots outperform any other condition with or without means added. Densities with and without means are reliably better than intervals without means. HOPs are not reliably different from intervals or densities with or without means added. The effect of adding means is only reliable for HOPs, but we can see below that the predicted error only changes by a negligible 0.08 percentage points in terms of probability of superiority.
Effect of means on slopes, marginalizing across levels of variance (in figure).
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, condition, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
mean_qi()
## # A tibble: 4 x 8
## # Groups: means [1]
## means condition slope .lower .upper .width .point .interval
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE - FALSE densities 0.00677 -0.0125 0.0256 0.95 mean qi
## 2 TRUE - FALSE intervals 0.0108 -0.00543 0.0267 0.95 mean qi
## 3 TRUE - FALSE HOPs -0.0408 -0.0669 -0.0146 0.95 mean qi
## 4 TRUE - FALSE QDPs -0.0196 -0.0381 -0.00105 0.95 mean qi
Predicted error with and without means, marginalizing across levels of variance. This helps us give a sense of visualization effectiveness.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
group_by(means, condition) %>% # group by predictors to keep
mean_qi(est_error)
## # A tibble: 8 x 8
## # Groups: means [2]
## means condition est_error .lower .upper .width .point .interval
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 FALSE densities -0.124 -0.369 0.0317 0.95 mean qi
## 2 FALSE intervals -0.148 -0.403 0.0313 0.95 mean qi
## 3 FALSE HOPs -0.140 -0.432 0.0713 0.95 mean qi
## 4 FALSE QDPs -0.0856 -0.267 0.0402 0.95 mean qi
## 5 TRUE densities -0.124 -0.354 0.0256 0.95 mean qi
## 6 TRUE intervals -0.143 -0.385 0.0277 0.95 mean qi
## 7 TRUE HOPs -0.140 -0.419 0.0665 0.95 mean qi
## 8 TRUE QDPs -0.0926 -0.273 0.0295 0.95 mean qi
Let’s look at the marginal effect of high vs low varaince on LLO slopes. This is an exploratory comparison that we do not present in the paper.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(sd_diff, .draw) %>% # group by predictors to keep
summarise(slope = mean(slope)) %>% # marginalize by taking a weighted average
compare_levels(slope, by = sd_diff) %>%
ggplot(aes(x = slope, y = "Effect of Variance")) +
stat_halfeyeh() +
labs(subtitle = "Difference in LLO Slopes (High - Low Variance)") +
theme_minimal() +
theme(legend.position = "none")
It looks like LLO slopes are larger at high than at low variance. One potential reason for this is that high variance stimuli use white space more effiencently, making the task easier especially for users relying on distance as a proxy for effect size.
Let’s look at predicted magnitude estimates to try to help with the interpretation of LLO slope as a metric.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(.prediction), color = means, fill = means)) +
stat_lineribbon(.width = c(.95), alpha = .25, show.legend = FALSE) +
theme_minimal() +
facet_grid(condition ~ sd_diff)
I find it hard to see the slope differences on this chart. The noise in posterior predictions swamps the signal we are able to measure using LLO slopes as a metric. This is we are referring to when we say that LLO slopes give us greater statistical power than simpler metrics like accuracy.
We can do a little better at showing the effect of interest by removing uncertainty in the prediction, but this seems a little antithetical to the whole point of the paper.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
group_by(lo_ground_truth, means, sd_diff, condition) %>% # marginalize
mutate(
ground_truth = plogis(lo_ground_truth),
avg_prediction = mean(plogis(.prediction))
) %>%
ggplot(aes(x = ground_truth, y = avg_prediction, color = means, fill = means)) +
stat_lineribbon(.width = c(.95), alpha = .35, show.legend = FALSE) +
theme_minimal() +
# coord_cartesian(ylim = c(0, 1)) +
facet_grid(condition ~ sd_diff)
We can also look at predicted errors in estimated probability of superiority to give a different view, although this isn’t much better.
model_df %>%
data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
ggplot(aes(x = plogis(lo_ground_truth), y = est_error, color = means, fill = means)) +
stat_lineribbon(.width = c(.95), alpha = .25, show.legend = FALSE) +
theme_minimal() +
facet_grid(condition ~ sd_diff)
In the paper, we decided to describe posterior predictions in term of marginal predicted average error for selected comparisons. We do this to contextualize LLO slopes in terms of average error, a more familiar but less precise metric for the kind of bias we measure.
Next, we load in the model of intervention decisions that we arrived at through a process of model expansion described in our preregistration[https://osf.io/9kpmb]. This is a hierachical logistic regression modeling the probability that chart users choose to pay for an intervention based on its effect size compared to status quo if they do not pay. See the paper and experiment/analysis/InterventionDecisions.Rmd in the supplemental materials for details.
m.decisions <- brm(
data = model_df, family = bernoulli(link = "logit"),
formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition*start_means + evidence*condition*trial),
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(1, 1), class = b, coef = evidence),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sd),
prior(lkj(4), class = cor)),
iter = 8000, warmup = 2000, chains = 2, cores = 2, thin = 2,
file = "model-fits/logistic_mdl-min_order-r_means_sd_trial2-long_chains")
summary(m.decisions)
## Family: bernoulli
## Links: mu = logit
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition * start_means + evidence * condition * trial
## Data: model_df (Number of observations: 19892)
## Samples: 2 chains, each with iter = 8000; warmup = 2000; thin = 2;
## total post-warmup samples = 6000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 622)
## Estimate Est.Error
## sd(Intercept) 1.86 0.09
## sd(evidence) 1.25 0.08
## sd(meansTRUE) 1.32 0.11
## sd(sd_diff15) 1.15 0.09
## sd(trial) 2.51 0.16
## sd(evidence:meansTRUE) 0.77 0.11
## sd(evidence:sd_diff15) 0.73 0.10
## sd(meansTRUE:sd_diff15) 0.73 0.16
## sd(evidence:trial) 1.51 0.16
## sd(evidence:meansTRUE:sd_diff15) 0.67 0.19
## cor(Intercept,evidence) 0.54 0.05
## cor(Intercept,meansTRUE) -0.10 0.09
## cor(evidence,meansTRUE) 0.06 0.09
## cor(Intercept,sd_diff15) -0.38 0.08
## cor(evidence,sd_diff15) -0.04 0.09
## cor(meansTRUE,sd_diff15) 0.24 0.10
## cor(Intercept,trial) 0.37 0.06
## cor(evidence,trial) 0.14 0.08
## cor(meansTRUE,trial) 0.22 0.08
## cor(sd_diff15,trial) -0.05 0.09
## cor(Intercept,evidence:meansTRUE) -0.17 0.11
## cor(evidence,evidence:meansTRUE) -0.04 0.13
## cor(meansTRUE,evidence:meansTRUE) 0.45 0.12
## cor(sd_diff15,evidence:meansTRUE) 0.29 0.12
## cor(trial,evidence:meansTRUE) 0.16 0.12
## cor(Intercept,evidence:sd_diff15) -0.38 0.11
## cor(evidence,evidence:sd_diff15) -0.08 0.12
## cor(meansTRUE,evidence:sd_diff15) -0.08 0.13
## cor(sd_diff15,evidence:sd_diff15) 0.64 0.10
## cor(trial,evidence:sd_diff15) -0.10 0.12
## cor(evidence:meansTRUE,evidence:sd_diff15) 0.16 0.15
## cor(Intercept,meansTRUE:sd_diff15) -0.10 0.15
## cor(evidence,meansTRUE:sd_diff15) 0.30 0.14
## cor(meansTRUE,meansTRUE:sd_diff15) -0.16 0.17
## cor(sd_diff15,meansTRUE:sd_diff15) 0.06 0.17
## cor(trial,meansTRUE:sd_diff15) -0.08 0.16
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 0.15 0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 0.23 0.18
## cor(Intercept,evidence:trial) 0.26 0.09
## cor(evidence,evidence:trial) 0.40 0.10
## cor(meansTRUE,evidence:trial) 0.06 0.12
## cor(sd_diff15,evidence:trial) -0.18 0.11
## cor(trial,evidence:trial) 0.50 0.09
## cor(evidence:meansTRUE,evidence:trial) 0.26 0.13
## cor(evidence:sd_diff15,evidence:trial) -0.12 0.13
## cor(meansTRUE:sd_diff15,evidence:trial) 0.12 0.17
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.21 0.15
## cor(evidence,evidence:meansTRUE:sd_diff15) 0.10 0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) -0.01 0.18
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 0.06 0.18
## cor(trial,evidence:meansTRUE:sd_diff15) -0.08 0.16
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) -0.07 0.19
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 0.09 0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.47 0.18
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.11 0.17
## l-95% CI u-95% CI Rhat
## sd(Intercept) 1.69 2.04 1.00
## sd(evidence) 1.10 1.41 1.00
## sd(meansTRUE) 1.09 1.55 1.00
## sd(sd_diff15) 0.97 1.34 1.00
## sd(trial) 2.19 2.83 1.00
## sd(evidence:meansTRUE) 0.56 0.98 1.00
## sd(evidence:sd_diff15) 0.53 0.92 1.00
## sd(meansTRUE:sd_diff15) 0.40 1.04 1.00
## sd(evidence:trial) 1.20 1.83 1.00
## sd(evidence:meansTRUE:sd_diff15) 0.26 1.02 1.00
## cor(Intercept,evidence) 0.43 0.64 1.00
## cor(Intercept,meansTRUE) -0.27 0.08 1.00
## cor(evidence,meansTRUE) -0.11 0.23 1.00
## cor(Intercept,sd_diff15) -0.52 -0.22 1.00
## cor(evidence,sd_diff15) -0.21 0.13 1.00
## cor(meansTRUE,sd_diff15) 0.04 0.42 1.00
## cor(Intercept,trial) 0.24 0.49 1.00
## cor(evidence,trial) -0.02 0.28 1.00
## cor(meansTRUE,trial) 0.06 0.39 1.00
## cor(sd_diff15,trial) -0.23 0.13 1.00
## cor(Intercept,evidence:meansTRUE) -0.39 0.04 1.00
## cor(evidence,evidence:meansTRUE) -0.29 0.22 1.00
## cor(meansTRUE,evidence:meansTRUE) 0.22 0.68 1.00
## cor(sd_diff15,evidence:meansTRUE) 0.04 0.52 1.00
## cor(trial,evidence:meansTRUE) -0.08 0.39 1.00
## cor(Intercept,evidence:sd_diff15) -0.58 -0.16 1.00
## cor(evidence,evidence:sd_diff15) -0.31 0.17 1.00
## cor(meansTRUE,evidence:sd_diff15) -0.33 0.17 1.00
## cor(sd_diff15,evidence:sd_diff15) 0.44 0.81 1.00
## cor(trial,evidence:sd_diff15) -0.34 0.13 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15) -0.14 0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15) -0.40 0.19 1.00
## cor(evidence,meansTRUE:sd_diff15) 0.02 0.57 1.00
## cor(meansTRUE,meansTRUE:sd_diff15) -0.46 0.19 1.00
## cor(sd_diff15,meansTRUE:sd_diff15) -0.26 0.42 1.00
## cor(trial,meansTRUE:sd_diff15) -0.39 0.24 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) -0.20 0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) -0.13 0.58 1.00
## cor(Intercept,evidence:trial) 0.08 0.43 1.00
## cor(evidence,evidence:trial) 0.20 0.59 1.00
## cor(meansTRUE,evidence:trial) -0.17 0.29 1.00
## cor(sd_diff15,evidence:trial) -0.38 0.04 1.00
## cor(trial,evidence:trial) 0.33 0.67 1.00
## cor(evidence:meansTRUE,evidence:trial) -0.00 0.50 1.00
## cor(evidence:sd_diff15,evidence:trial) -0.38 0.15 1.00
## cor(meansTRUE:sd_diff15,evidence:trial) -0.22 0.45 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15) -0.50 0.10 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15) -0.21 0.39 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) -0.36 0.34 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) -0.28 0.42 1.00
## cor(trial,evidence:meansTRUE:sd_diff15) -0.40 0.24 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) -0.42 0.33 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) -0.26 0.49 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 0.07 0.76 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) -0.43 0.24 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 3190 4292
## sd(evidence) 3058 4461
## sd(meansTRUE) 1612 2990
## sd(sd_diff15) 2599 4180
## sd(trial) 2471 4138
## sd(evidence:meansTRUE) 1538 2446
## sd(evidence:sd_diff15) 1921 3090
## sd(meansTRUE:sd_diff15) 1032 2038
## sd(evidence:trial) 2350 3688
## sd(evidence:meansTRUE:sd_diff15) 848 847
## cor(Intercept,evidence) 2272 3940
## cor(Intercept,meansTRUE) 2406 3750
## cor(evidence,meansTRUE) 1949 3268
## cor(Intercept,sd_diff15) 2968 4690
## cor(evidence,sd_diff15) 2082 3333
## cor(meansTRUE,sd_diff15) 1668 2843
## cor(Intercept,trial) 2898 4233
## cor(evidence,trial) 2063 3515
## cor(meansTRUE,trial) 1933 3834
## cor(sd_diff15,trial) 1858 3241
## cor(Intercept,evidence:meansTRUE) 2846 4493
## cor(evidence,evidence:meansTRUE) 2600 3989
## cor(meansTRUE,evidence:meansTRUE) 1312 3390
## cor(sd_diff15,evidence:meansTRUE) 1636 3054
## cor(trial,evidence:meansTRUE) 2019 3719
## cor(Intercept,evidence:sd_diff15) 2821 4307
## cor(evidence,evidence:sd_diff15) 3105 4384
## cor(meansTRUE,evidence:sd_diff15) 2264 4012
## cor(sd_diff15,evidence:sd_diff15) 2212 3869
## cor(trial,evidence:sd_diff15) 2831 4078
## cor(evidence:meansTRUE,evidence:sd_diff15) 2419 3804
## cor(Intercept,meansTRUE:sd_diff15) 3867 4551
## cor(evidence,meansTRUE:sd_diff15) 3152 4670
## cor(meansTRUE,meansTRUE:sd_diff15) 2511 4118
## cor(sd_diff15,meansTRUE:sd_diff15) 1734 3474
## cor(trial,meansTRUE:sd_diff15) 2015 3868
## cor(evidence:meansTRUE,meansTRUE:sd_diff15) 1819 3396
## cor(evidence:sd_diff15,meansTRUE:sd_diff15) 1727 3493
## cor(Intercept,evidence:trial) 3068 4615
## cor(evidence,evidence:trial) 2680 4130
## cor(meansTRUE,evidence:trial) 1791 3355
## cor(sd_diff15,evidence:trial) 2246 3842
## cor(trial,evidence:trial) 2195 3860
## cor(evidence:meansTRUE,evidence:trial) 2427 3776
## cor(evidence:sd_diff15,evidence:trial) 2007 3719
## cor(meansTRUE:sd_diff15,evidence:trial) 1399 2790
## cor(Intercept,evidence:meansTRUE:sd_diff15) 3459 3849
## cor(evidence,evidence:meansTRUE:sd_diff15) 3905 4262
## cor(meansTRUE,evidence:meansTRUE:sd_diff15) 3161 3810
## cor(sd_diff15,evidence:meansTRUE:sd_diff15) 2776 4130
## cor(trial,evidence:meansTRUE:sd_diff15) 3646 4390
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15) 2565 4190
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15) 1800 3564
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15) 1630 1885
## cor(evidence:trial,evidence:meansTRUE:sd_diff15) 2625 4330
##
## Population-Level Effects:
## Estimate
## Intercept 0.33
## evidence 2.15
## meansTRUE -0.41
## sd_diff15 1.07
## conditionHOPs -0.26
## conditionintervals -0.33
## conditionQDPs 0.31
## start_meansTRUE -0.50
## trial 1.26
## evidence:meansTRUE -0.12
## evidence:sd_diff15 0.61
## meansTRUE:sd_diff15 0.64
## evidence:conditionHOPs -0.19
## evidence:conditionintervals -0.19
## evidence:conditionQDPs 0.31
## meansTRUE:conditionHOPs 0.01
## meansTRUE:conditionintervals -0.01
## meansTRUE:conditionQDPs -0.33
## sd_diff15:conditionHOPs 0.46
## sd_diff15:conditionintervals 0.36
## sd_diff15:conditionQDPs 0.09
## evidence:start_meansTRUE -0.50
## meansTRUE:start_meansTRUE 0.49
## sd_diff15:start_meansTRUE 0.50
## conditionHOPs:start_meansTRUE -0.37
## conditionintervals:start_meansTRUE -0.29
## conditionQDPs:start_meansTRUE 0.22
## evidence:trial 1.71
## conditionHOPs:trial -0.02
## conditionintervals:trial 0.65
## conditionQDPs:trial 0.33
## evidence:meansTRUE:sd_diff15 0.04
## evidence:meansTRUE:conditionHOPs -0.27
## evidence:meansTRUE:conditionintervals 0.16
## evidence:meansTRUE:conditionQDPs 0.05
## evidence:sd_diff15:conditionHOPs 0.09
## evidence:sd_diff15:conditionintervals 0.33
## evidence:sd_diff15:conditionQDPs 0.20
## meansTRUE:sd_diff15:conditionHOPs -0.53
## meansTRUE:sd_diff15:conditionintervals 0.55
## meansTRUE:sd_diff15:conditionQDPs 0.25
## evidence:meansTRUE:start_meansTRUE 0.38
## evidence:sd_diff15:start_meansTRUE 0.21
## meansTRUE:sd_diff15:start_meansTRUE -0.05
## evidence:conditionHOPs:start_meansTRUE -0.35
## evidence:conditionintervals:start_meansTRUE 0.13
## evidence:conditionQDPs:start_meansTRUE -0.10
## meansTRUE:conditionHOPs:start_meansTRUE 0.16
## meansTRUE:conditionintervals:start_meansTRUE 0.14
## meansTRUE:conditionQDPs:start_meansTRUE 0.01
## sd_diff15:conditionHOPs:start_meansTRUE 0.13
## sd_diff15:conditionintervals:start_meansTRUE -0.33
## sd_diff15:conditionQDPs:start_meansTRUE -0.25
## evidence:conditionHOPs:trial -0.44
## evidence:conditionintervals:trial 0.68
## evidence:conditionQDPs:trial 0.39
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.46
## evidence:meansTRUE:sd_diff15:conditionintervals 0.25
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.07
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.31
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 0.54
## evidence:meansTRUE:conditionintervals:start_meansTRUE -0.01
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.15
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.11
## evidence:sd_diff15:conditionintervals:start_meansTRUE -0.39
## evidence:sd_diff15:conditionQDPs:start_meansTRUE -0.01
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.34
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.26
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.17
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.27
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.29
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.05
## Est.Error
## Intercept 0.17
## evidence 0.15
## meansTRUE 0.18
## sd_diff15 0.16
## conditionHOPs 0.23
## conditionintervals 0.23
## conditionQDPs 0.23
## start_meansTRUE 0.22
## trial 0.22
## evidence:meansTRUE 0.17
## evidence:sd_diff15 0.16
## meansTRUE:sd_diff15 0.19
## evidence:conditionHOPs 0.19
## evidence:conditionintervals 0.19
## evidence:conditionQDPs 0.20
## meansTRUE:conditionHOPs 0.25
## meansTRUE:conditionintervals 0.25
## meansTRUE:conditionQDPs 0.25
## sd_diff15:conditionHOPs 0.22
## sd_diff15:conditionintervals 0.21
## sd_diff15:conditionQDPs 0.22
## evidence:start_meansTRUE 0.19
## meansTRUE:start_meansTRUE 0.23
## sd_diff15:start_meansTRUE 0.20
## conditionHOPs:start_meansTRUE 0.31
## conditionintervals:start_meansTRUE 0.30
## conditionQDPs:start_meansTRUE 0.30
## evidence:trial 0.22
## conditionHOPs:trial 0.31
## conditionintervals:trial 0.31
## conditionQDPs:trial 0.32
## evidence:meansTRUE:sd_diff15 0.21
## evidence:meansTRUE:conditionHOPs 0.22
## evidence:meansTRUE:conditionintervals 0.23
## evidence:meansTRUE:conditionQDPs 0.24
## evidence:sd_diff15:conditionHOPs 0.20
## evidence:sd_diff15:conditionintervals 0.21
## evidence:sd_diff15:conditionQDPs 0.22
## meansTRUE:sd_diff15:conditionHOPs 0.25
## meansTRUE:sd_diff15:conditionintervals 0.26
## meansTRUE:sd_diff15:conditionQDPs 0.26
## evidence:meansTRUE:start_meansTRUE 0.22
## evidence:sd_diff15:start_meansTRUE 0.20
## meansTRUE:sd_diff15:start_meansTRUE 0.23
## evidence:conditionHOPs:start_meansTRUE 0.26
## evidence:conditionintervals:start_meansTRUE 0.27
## evidence:conditionQDPs:start_meansTRUE 0.28
## meansTRUE:conditionHOPs:start_meansTRUE 0.32
## meansTRUE:conditionintervals:start_meansTRUE 0.32
## meansTRUE:conditionQDPs:start_meansTRUE 0.32
## sd_diff15:conditionHOPs:start_meansTRUE 0.28
## sd_diff15:conditionintervals:start_meansTRUE 0.28
## sd_diff15:conditionQDPs:start_meansTRUE 0.28
## evidence:conditionHOPs:trial 0.29
## evidence:conditionintervals:trial 0.29
## evidence:conditionQDPs:trial 0.30
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.25
## evidence:meansTRUE:sd_diff15:conditionintervals 0.27
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.27
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.24
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 0.29
## evidence:meansTRUE:conditionintervals:start_meansTRUE 0.31
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.31
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.27
## evidence:sd_diff15:conditionintervals:start_meansTRUE 0.27
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 0.28
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.32
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.32
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.32
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.31
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.33
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.33
## l-95% CI
## Intercept -0.00
## evidence 1.87
## meansTRUE -0.77
## sd_diff15 0.77
## conditionHOPs -0.71
## conditionintervals -0.78
## conditionQDPs -0.14
## start_meansTRUE -0.93
## trial 0.84
## evidence:meansTRUE -0.45
## evidence:sd_diff15 0.30
## meansTRUE:sd_diff15 0.28
## evidence:conditionHOPs -0.57
## evidence:conditionintervals -0.57
## evidence:conditionQDPs -0.08
## meansTRUE:conditionHOPs -0.49
## meansTRUE:conditionintervals -0.49
## meansTRUE:conditionQDPs -0.82
## sd_diff15:conditionHOPs 0.04
## sd_diff15:conditionintervals -0.05
## sd_diff15:conditionQDPs -0.35
## evidence:start_meansTRUE -0.88
## meansTRUE:start_meansTRUE 0.02
## sd_diff15:start_meansTRUE 0.11
## conditionHOPs:start_meansTRUE -0.98
## conditionintervals:start_meansTRUE -0.88
## conditionQDPs:start_meansTRUE -0.36
## evidence:trial 1.28
## conditionHOPs:trial -0.63
## conditionintervals:trial 0.06
## conditionQDPs:trial -0.29
## evidence:meansTRUE:sd_diff15 -0.37
## evidence:meansTRUE:conditionHOPs -0.70
## evidence:meansTRUE:conditionintervals -0.29
## evidence:meansTRUE:conditionQDPs -0.42
## evidence:sd_diff15:conditionHOPs -0.31
## evidence:sd_diff15:conditionintervals -0.08
## evidence:sd_diff15:conditionQDPs -0.22
## meansTRUE:sd_diff15:conditionHOPs -1.00
## meansTRUE:sd_diff15:conditionintervals 0.04
## meansTRUE:sd_diff15:conditionQDPs -0.27
## evidence:meansTRUE:start_meansTRUE -0.05
## evidence:sd_diff15:start_meansTRUE -0.17
## meansTRUE:sd_diff15:start_meansTRUE -0.50
## evidence:conditionHOPs:start_meansTRUE -0.87
## evidence:conditionintervals:start_meansTRUE -0.39
## evidence:conditionQDPs:start_meansTRUE -0.65
## meansTRUE:conditionHOPs:start_meansTRUE -0.45
## meansTRUE:conditionintervals:start_meansTRUE -0.48
## meansTRUE:conditionQDPs:start_meansTRUE -0.63
## sd_diff15:conditionHOPs:start_meansTRUE -0.41
## sd_diff15:conditionintervals:start_meansTRUE -0.88
## sd_diff15:conditionQDPs:start_meansTRUE -0.81
## evidence:conditionHOPs:trial -1.01
## evidence:conditionintervals:trial 0.12
## evidence:conditionQDPs:trial -0.19
## evidence:meansTRUE:sd_diff15:conditionHOPs -0.95
## evidence:meansTRUE:sd_diff15:conditionintervals -0.27
## evidence:meansTRUE:sd_diff15:conditionQDPs -0.60
## evidence:meansTRUE:sd_diff15:start_meansTRUE -0.16
## evidence:meansTRUE:conditionHOPs:start_meansTRUE -0.04
## evidence:meansTRUE:conditionintervals:start_meansTRUE -0.62
## evidence:meansTRUE:conditionQDPs:start_meansTRUE -0.47
## evidence:sd_diff15:conditionHOPs:start_meansTRUE -0.40
## evidence:sd_diff15:conditionintervals:start_meansTRUE -0.93
## evidence:sd_diff15:conditionQDPs:start_meansTRUE -0.56
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.27
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.88
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.78
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.35
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.35
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.69
## u-95% CI Rhat
## Intercept 0.66 1.00
## evidence 2.45 1.00
## meansTRUE -0.06 1.00
## sd_diff15 1.37 1.00
## conditionHOPs 0.19 1.00
## conditionintervals 0.13 1.00
## conditionQDPs 0.76 1.00
## start_meansTRUE -0.07 1.00
## trial 1.71 1.00
## evidence:meansTRUE 0.22 1.00
## evidence:sd_diff15 0.92 1.00
## meansTRUE:sd_diff15 1.01 1.00
## evidence:conditionHOPs 0.19 1.00
## evidence:conditionintervals 0.19 1.00
## evidence:conditionQDPs 0.70 1.00
## meansTRUE:conditionHOPs 0.48 1.00
## meansTRUE:conditionintervals 0.49 1.00
## meansTRUE:conditionQDPs 0.17 1.00
## sd_diff15:conditionHOPs 0.89 1.00
## sd_diff15:conditionintervals 0.78 1.00
## sd_diff15:conditionQDPs 0.51 1.00
## evidence:start_meansTRUE -0.12 1.00
## meansTRUE:start_meansTRUE 0.94 1.00
## sd_diff15:start_meansTRUE 0.89 1.00
## conditionHOPs:start_meansTRUE 0.24 1.00
## conditionintervals:start_meansTRUE 0.30 1.00
## conditionQDPs:start_meansTRUE 0.83 1.00
## evidence:trial 2.12 1.00
## conditionHOPs:trial 0.57 1.00
## conditionintervals:trial 1.27 1.00
## conditionQDPs:trial 0.95 1.00
## evidence:meansTRUE:sd_diff15 0.44 1.00
## evidence:meansTRUE:conditionHOPs 0.15 1.00
## evidence:meansTRUE:conditionintervals 0.62 1.00
## evidence:meansTRUE:conditionQDPs 0.51 1.00
## evidence:sd_diff15:conditionHOPs 0.49 1.00
## evidence:sd_diff15:conditionintervals 0.74 1.00
## evidence:sd_diff15:conditionQDPs 0.63 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.05 1.00
## meansTRUE:sd_diff15:conditionintervals 1.05 1.00
## meansTRUE:sd_diff15:conditionQDPs 0.78 1.00
## evidence:meansTRUE:start_meansTRUE 0.81 1.00
## evidence:sd_diff15:start_meansTRUE 0.59 1.00
## meansTRUE:sd_diff15:start_meansTRUE 0.39 1.00
## evidence:conditionHOPs:start_meansTRUE 0.16 1.00
## evidence:conditionintervals:start_meansTRUE 0.66 1.00
## evidence:conditionQDPs:start_meansTRUE 0.44 1.00
## meansTRUE:conditionHOPs:start_meansTRUE 0.78 1.00
## meansTRUE:conditionintervals:start_meansTRUE 0.76 1.00
## meansTRUE:conditionQDPs:start_meansTRUE 0.62 1.00
## sd_diff15:conditionHOPs:start_meansTRUE 0.68 1.00
## sd_diff15:conditionintervals:start_meansTRUE 0.22 1.00
## sd_diff15:conditionQDPs:start_meansTRUE 0.31 1.00
## evidence:conditionHOPs:trial 0.11 1.00
## evidence:conditionintervals:trial 1.25 1.00
## evidence:conditionQDPs:trial 0.97 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs 0.03 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals 0.77 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs 0.47 1.00
## evidence:meansTRUE:sd_diff15:start_meansTRUE 0.77 1.00
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 1.11 1.00
## evidence:meansTRUE:conditionintervals:start_meansTRUE 0.60 1.00
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 0.77 1.00
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 0.62 1.00
## evidence:sd_diff15:conditionintervals:start_meansTRUE 0.16 1.00
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 0.54 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.95 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.39 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.47 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.88 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.93 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.61 1.00
## Bulk_ESS
## Intercept 2871
## evidence 2786
## meansTRUE 3987
## sd_diff15 3702
## conditionHOPs 2996
## conditionintervals 3053
## conditionQDPs 3122
## start_meansTRUE 3228
## trial 4422
## evidence:meansTRUE 3939
## evidence:sd_diff15 3157
## meansTRUE:sd_diff15 4189
## evidence:conditionHOPs 3249
## evidence:conditionintervals 3086
## evidence:conditionQDPs 3351
## meansTRUE:conditionHOPs 4807
## meansTRUE:conditionintervals 4322
## meansTRUE:conditionQDPs 4868
## sd_diff15:conditionHOPs 3502
## sd_diff15:conditionintervals 4409
## sd_diff15:conditionQDPs 4067
## evidence:start_meansTRUE 3392
## meansTRUE:start_meansTRUE 4654
## sd_diff15:start_meansTRUE 4145
## conditionHOPs:start_meansTRUE 3588
## conditionintervals:start_meansTRUE 3593
## conditionQDPs:start_meansTRUE 3362
## evidence:trial 4500
## conditionHOPs:trial 4810
## conditionintervals:trial 4997
## conditionQDPs:trial 4993
## evidence:meansTRUE:sd_diff15 3630
## evidence:meansTRUE:conditionHOPs 4796
## evidence:meansTRUE:conditionintervals 4310
## evidence:meansTRUE:conditionQDPs 4424
## evidence:sd_diff15:conditionHOPs 4607
## evidence:sd_diff15:conditionintervals 4113
## evidence:sd_diff15:conditionQDPs 4403
## meansTRUE:sd_diff15:conditionHOPs 4949
## meansTRUE:sd_diff15:conditionintervals 4750
## meansTRUE:sd_diff15:conditionQDPs 4967
## evidence:meansTRUE:start_meansTRUE 4589
## evidence:sd_diff15:start_meansTRUE 3544
## meansTRUE:sd_diff15:start_meansTRUE 4248
## evidence:conditionHOPs:start_meansTRUE 3795
## evidence:conditionintervals:start_meansTRUE 3828
## evidence:conditionQDPs:start_meansTRUE 3728
## meansTRUE:conditionHOPs:start_meansTRUE 5141
## meansTRUE:conditionintervals:start_meansTRUE 5186
## meansTRUE:conditionQDPs:start_meansTRUE 5088
## sd_diff15:conditionHOPs:start_meansTRUE 3921
## sd_diff15:conditionintervals:start_meansTRUE 4202
## sd_diff15:conditionQDPs:start_meansTRUE 4515
## evidence:conditionHOPs:trial 4765
## evidence:conditionintervals:trial 5027
## evidence:conditionQDPs:trial 4559
## evidence:meansTRUE:sd_diff15:conditionHOPs 4527
## evidence:meansTRUE:sd_diff15:conditionintervals 4891
## evidence:meansTRUE:sd_diff15:conditionQDPs 4796
## evidence:meansTRUE:sd_diff15:start_meansTRUE 4210
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 4561
## evidence:meansTRUE:conditionintervals:start_meansTRUE 4378
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 4901
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 4831
## evidence:sd_diff15:conditionintervals:start_meansTRUE 4245
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 4892
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 4779
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 4387
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 5099
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 4464
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 5115
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 4681
## Tail_ESS
## Intercept 4164
## evidence 4202
## meansTRUE 5148
## sd_diff15 4612
## conditionHOPs 3740
## conditionintervals 3933
## conditionQDPs 4430
## start_meansTRUE 4260
## trial 5024
## evidence:meansTRUE 4624
## evidence:sd_diff15 4407
## meansTRUE:sd_diff15 4986
## evidence:conditionHOPs 4303
## evidence:conditionintervals 4268
## evidence:conditionQDPs 4877
## meansTRUE:conditionHOPs 5366
## meansTRUE:conditionintervals 5005
## meansTRUE:conditionQDPs 5242
## sd_diff15:conditionHOPs 4354
## sd_diff15:conditionintervals 5051
## sd_diff15:conditionQDPs 4475
## evidence:start_meansTRUE 3776
## meansTRUE:start_meansTRUE 5044
## sd_diff15:start_meansTRUE 5328
## conditionHOPs:start_meansTRUE 4562
## conditionintervals:start_meansTRUE 4878
## conditionQDPs:start_meansTRUE 4445
## evidence:trial 4754
## conditionHOPs:trial 5275
## conditionintervals:trial 5009
## conditionQDPs:trial 5487
## evidence:meansTRUE:sd_diff15 5179
## evidence:meansTRUE:conditionHOPs 5121
## evidence:meansTRUE:conditionintervals 5001
## evidence:meansTRUE:conditionQDPs 5168
## evidence:sd_diff15:conditionHOPs 5324
## evidence:sd_diff15:conditionintervals 5088
## evidence:sd_diff15:conditionQDPs 4921
## meansTRUE:sd_diff15:conditionHOPs 5156
## meansTRUE:sd_diff15:conditionintervals 5196
## meansTRUE:sd_diff15:conditionQDPs 5265
## evidence:meansTRUE:start_meansTRUE 5128
## evidence:sd_diff15:start_meansTRUE 4911
## meansTRUE:sd_diff15:start_meansTRUE 5010
## evidence:conditionHOPs:start_meansTRUE 4507
## evidence:conditionintervals:start_meansTRUE 4957
## evidence:conditionQDPs:start_meansTRUE 4697
## meansTRUE:conditionHOPs:start_meansTRUE 5334
## meansTRUE:conditionintervals:start_meansTRUE 5242
## meansTRUE:conditionQDPs:start_meansTRUE 5336
## sd_diff15:conditionHOPs:start_meansTRUE 5153
## sd_diff15:conditionintervals:start_meansTRUE 4665
## sd_diff15:conditionQDPs:start_meansTRUE 4711
## evidence:conditionHOPs:trial 5172
## evidence:conditionintervals:trial 5503
## evidence:conditionQDPs:trial 5431
## evidence:meansTRUE:sd_diff15:conditionHOPs 5096
## evidence:meansTRUE:sd_diff15:conditionintervals 4985
## evidence:meansTRUE:sd_diff15:conditionQDPs 5409
## evidence:meansTRUE:sd_diff15:start_meansTRUE 4878
## evidence:meansTRUE:conditionHOPs:start_meansTRUE 5050
## evidence:meansTRUE:conditionintervals:start_meansTRUE 4924
## evidence:meansTRUE:conditionQDPs:start_meansTRUE 5110
## evidence:sd_diff15:conditionHOPs:start_meansTRUE 5468
## evidence:sd_diff15:conditionintervals:start_meansTRUE 5299
## evidence:sd_diff15:conditionQDPs:start_meansTRUE 5508
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 5257
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 5021
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 5457
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 5161
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 5433
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 5510
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Our research questions are about the points of subjective equality (PSE) and just-noticable differences (JND) for this logistic regression model. We derive estimates of these two statistics from the model’s posterior distribution.
# get slopes from linear model
slopes_df <- model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.decisions, re_formula = NA, scale = "linear", seed = 1234) %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get intercepts from linear model
intercepts_df <- model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.decisions, re_formula = NA, scale = "linear", seed = 1234) %>%
rename(intercept = .value)
# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>%
full_join(intercepts_df, by = c("means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
mutate(
# evidence units
pse = -intercept / slope,
jnd = qlogis(0.75) / slope,
# probabilities of winning with the new player
pse_p_award = exp(pse) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(pse)) - unique(model_df$baseline) - 1 / unique(model_df$award_value),
jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)
)
PSE describe a chart user’s bias toward or against intervening compared to utility optimal decision criterion on the evidence scale (a proxy for effect size which is described in the paper).
Let’s take a look at the interaction effects on PSE of adding means at difference levels of variance.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
ggplot(aes(x = pse, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(subtitle = "PSE Interaction") +
theme_minimal() +
facet_grid(sd_diff ~ .)
Let’s look at contrasts for the impact of the mean.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
compare_levels(pse, by = means) %>%
ggplot(aes(x = pse, y = condition)) +
stat_halfeyeh() +
labs(subtitle = "Difference in PSE (Means present - absent)") +
theme_minimal() +
facet_grid(sd_diff ~ .)
In terms of the direction of effect, extrinsic means seem to consistently bias PSE toward intervention at high variance and away from intervention at low variance. This has the impact of exacerbating biases in decisions compared to when means are absent (with the exception of quantile dotplots at low variance). However, these effect of adding the mean only appear to be reliable for quantile dotplots at low variance and for intervals and maybe densities at high variance. We suspect that more data would shrink the uncertainty in these estimates revealing this to be persistent trend.
Quantile dotplots are slightly different than other charts in that they are the only uncertainty encoding that consistently biases users toward intervention, regardless of the level of variance. This means that the positive impact on PSE induced by adding means at low variance is debiasing for quantile doplots, which is the only case where we can say that adding means is reliably helpful for decision-making.
Effect of means on PSE for each combination of visualization condition and level of variance (in figure).
pse_tbl <- stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
compare_levels(pse, by = means) %>%
mean_qi()
pse_p_tbl <- stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse_p_award = mean(pse_p_award)) %>%
compare_levels(pse_p_award, by = means) %>%
mean_qi()
pse_tbl %>% full_join(pse_p_tbl, by = c("means", "sd_diff", "condition"))
## # A tibble: 8 x 15
## # Groups: means, sd_diff [2]
## means sd_diff condition pse .lower.x .upper.x .width.x .point.x
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 TRUE… 5 densities 0.133 -0.0299 0.314 0.95 mean
## 2 TRUE… 5 intervals 0.125 -0.144 0.433 0.95 mean
## 3 TRUE… 5 HOPs 0.0908 -0.160 0.353 0.95 mean
## 4 TRUE… 5 QDPs 0.252 0.0816 0.435 0.95 mean
## 5 TRUE… 15 densities -0.116 -0.238 0.00174 0.95 mean
## 6 TRUE… 15 intervals -0.160 -0.277 -0.0400 0.95 mean
## 7 TRUE… 15 HOPs -0.0984 -0.267 0.0603 0.95 mean
## 8 TRUE… 15 QDPs -0.0429 -0.156 0.0727 0.95 mean
## # … with 7 more variables: .interval.x <chr>, pse_p_award <dbl>,
## # .lower.y <dbl>, .upper.y <dbl>, .width.y <dbl>, .point.y <chr>,
## # .interval.y <chr>
PSE with and without means added for each combination of visualization condition and level of variance. These numbers help us explain the nuanced differences in PSE between visualization designs in the paper.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(
pse = mean(pse),
pse_p_award = mean(pse_p_award)
) %>%
mean_qi()
## # A tibble: 16 x 12
## # Groups: means, sd_diff [4]
## means sd_diff condition pse pse.lower pse.upper pse_p_award
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 FALSE 5 densities -0.0335 -0.188 0.127 -0.00611
## 2 FALSE 5 intervals 0.292 0.0775 0.548 0.0350
## 3 FALSE 5 HOPs 0.255 0.0210 0.531 0.0324
## 4 FALSE 5 QDPs -0.213 -0.360 -0.0637 -0.0356
## 5 FALSE 15 densities -0.530 -0.643 -0.418 -0.0936
## 6 FALSE 15 intervals -0.423 -0.541 -0.302 -0.0727
## 7 FALSE 15 HOPs -0.625 -0.759 -0.496 -0.113
## 8 FALSE 15 QDPs -0.573 -0.676 -0.470 -0.102
## 9 TRUE 5 densities 0.0994 -0.0969 0.323 0.0110
## 10 TRUE 5 intervals 0.417 0.158 0.748 0.0420
## 11 TRUE 5 HOPs 0.346 0.0719 0.674 0.0388
## 12 TRUE 5 QDPs 0.0395 -0.137 0.242 0.00339
## 13 TRUE 15 densities -0.646 -0.770 -0.524 -0.117
## 14 TRUE 15 intervals -0.583 -0.693 -0.477 -0.105
## 15 TRUE 15 HOPs -0.723 -0.888 -0.562 -0.134
## 16 TRUE 15 QDPs -0.616 -0.729 -0.501 -0.111
## # … with 5 more variables: pse_p_award.lower <dbl>, pse_p_award.upper <dbl>,
## # .width <dbl>, .point <chr>, .interval <chr>
Effect of means on PSE, marginalizing across visualization condition (in figure).
pse_tbl <- stats_df %>%
group_by(means, sd_diff, .draw) %>%
summarise(pse = mean(pse)) %>%
compare_levels(pse, by = means) %>%
mean_qi()
pse_p_tbl <- stats_df %>%
group_by(means, sd_diff, .draw) %>%
summarise(pse_p_award = mean(pse_p_award)) %>%
compare_levels(pse_p_award, by = means) %>%
mean_qi()
pse_tbl %>% full_join(pse_p_tbl, by = c("means", "sd_diff"))
## # A tibble: 2 x 14
## # Groups: means [1]
## means sd_diff pse .lower.x .upper.x .width.x .point.x .interval.x
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE… 5 0.150 0.0271 0.281 0.95 mean qi
## 2 TRUE… 15 -0.105 -0.175 -0.0358 0.95 mean qi
## # … with 6 more variables: pse_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## # .width.y <dbl>, .point.y <chr>, .interval.y <chr>
PSE with and without means, marginalizing across visualization conditions. These numbers help us explain the aggregate effect of adding means on decision quality at each level of variance.
stats_df %>%
group_by(means, sd_diff, .draw) %>% # maginalize out other manipulations
summarise(
pse = mean(pse),
pse_p_award = mean(pse_p_award)
) %>%
mean_qi()
## # A tibble: 4 x 11
## # Groups: means [2]
## means sd_diff pse pse.lower pse.upper pse_p_award pse_p_award.low…
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FALSE 5 0.0750 -0.0344 0.197 0.00642 -0.00849
## 2 FALSE 15 -0.538 -0.599 -0.476 -0.0954 -0.108
## 3 TRUE 5 0.225 0.0936 0.366 0.0238 0.00821
## 4 TRUE 15 -0.642 -0.712 -0.575 -0.117 -0.132
## # … with 4 more variables: pse_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## # .interval <chr>
Let’s visualize the effect of means on PSE, marginalizing across visualization conditions, since this is particularly important and isn’t shown clearly above.
stats_df %>%
group_by(means, sd_diff, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
compare_levels(pse, by = means) %>%
ggplot(aes(x = pse, y = sd_diff)) +
stat_halfeyeh() +
labs(subtitle = "Difference in PSE (Means present - absent)") +
theme_minimal()
It looks like the effect of means is reliable if we marginalize across visualization conditions, which lends credence to the argument that this effect is robust.
We preregistered comparisons between estimates of PSE per visualization, marginalizing across other manipulations. However, it occurs to us in hindsight that this marginalization corresponds to a visualization designers cannot render, a chart both with and without means at the same time. Therefore, we omit these comparisons from the paper and present them only in supplemental materials.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
ggplot(aes(x = pse, y = condition, fill = condition)) +
stat_slabh(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
labs(subtitle = "PSE Per Visualization Condition") +
theme_minimal() +
theme(legend.position = "none")
stats_df %>%
group_by(condition, .draw) %>%
summarise(
pse = mean(pse),
pse_p_award = mean(pse_p_award)
) %>%
mean_qi()
## # A tibble: 4 x 10
## condition pse pse.lower pse.upper pse_p_award pse_p_award.low…
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 densities -0.277 -0.392 -0.153 -0.0515 -0.0711
## 2 intervals -0.0746 -0.213 0.0795 -0.0252 -0.0441
## 3 HOPs -0.187 -0.342 -0.0135 -0.0440 -0.0684
## 4 QDPs -0.340 -0.447 -0.228 -0.0614 -0.0804
## # … with 4 more variables: pse_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## # .interval <chr>
Let’s look at contrasts between visualization conditions for visual reliability tests.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
compare_levels(pse, by = condition) %>%
ggplot(aes(x = pse, y = condition)) +
stat_halfeyeh() +
labs(subtitle = "Differences in PSE Between Visualization Conditions") +
theme_minimal()
It looks like the point of subjective equality is least biased with intervals, with increasing bias toward intervening (i.e., negative PSE) with HOPs, densities, and quantile dotplots, respectively. Only pairwise differences of intervals minus densities and intervals minus quantile dotplots are reliable.
It seems like the patterns of results for PSE at low vs high variance are different enough that we might want to make different design recommendations depending on the level of variance shown in charts. For this reason, in the paper we present contrasts between visualization designs at low and high variance separately.
Let’s start by looking at contrasts between visualization designs at low variance.
stats_df %>%
filter(sd_diff == 5) %>%
unite(design, c("condition", "means")) %>%
group_by(design, .draw) %>% # group by predictors to keep
summarise(pse = mean(pse)) %>% # marginalize by taking a weighted average
compare_levels(pse, by = design) %>%
ggplot(aes(x = pse, y = design)) +
stat_halfeyeh() +
labs(subtitle = "Differences in PSE Between Visualization Designs at Low Variance") +
theme_minimal()
What we can take away from this figure that we didn’t get from the chart of interaction effects above is that there are no reliable differences among visualization designs that use intervals and HOPs for uncertainty encodings. This is true for densities and quantile dotplots as well, with the exception of the comparison between quantile dotplots without means and densities with means, two designs with opposite directions of bias. Densities without means and quantile dotplots with means are the least biased conditions and are reliably different from designs that use intervals and HOPs to encode uncertainty.
Just to reiterate, this particularly important comparsion, there is no reliable difference between densities without means and quantile dotplots with means.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(pse = mean(pse)) %>%
filter(sd_diff == 5) %>%
unite(vis_cond, condition, means) %>%
filter(vis_cond %in% c("densities_FALSE", "QDPs_TRUE")) %>%
compare_levels(pse, by = vis_cond) %>%
ggplot(aes(x = pse, y = vis_cond)) +
stat_halfeyeh() +
labs(subtitle = "Differences in PSE Densities and QDPs at Low Variance") +
theme_minimal()
Now, we’ll consider contrasts between visualization designs high variance.
stats_df %>%
filter(sd_diff == 15) %>%
unite(design, c("condition", "means")) %>%
group_by(design, .draw) %>% # group by predictors to keep
summarise(pse = mean(pse)) %>% # marginalize by taking a weighted average
compare_levels(pse, by = design) %>%
ggplot(aes(x = pse, y = design)) +
stat_halfeyeh() +
labs(subtitle = "Differences in PSE Between Visualization Designs at High Variance") +
theme_minimal()
When we look for the least biased distributional encoding at high variance, intervals without means stand out. However, they are not reliably less biased than intervals without means.
Now, let’s look at constrasts for the impact of the level of variance. This is an exploratory comparison.
stats_df %>%
group_by(sd_diff, .draw) %>% # maginalize out other manipulations (including means present/absent)
summarise(pse = mean(jnd)) %>%
compare_levels(pse, by = sd_diff) %>%
ggplot(aes(x = pse, y = "Effect of Variance")) +
stat_halfeyeh() +
labs(x = "Difference in PSE (High - Low Variance)") +
theme_minimal()
People seem to intervene more than they should when uncertainty is high. It may be that users err on the side of caution in decision-making when the span of distributions is larger compared to the width of the axis. This was not really our primary research question, but it is an interesting result that future work should probably investigate further.
JNDs describe a chart user’s sensitivity to effect size information (i.e., evidence) for the purpose of making decisions.
Since we are interested in the way that extinsic means impact the perception of effect size at difference levels of variance, we look at how this effect manifests in JNDs.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
ggplot(aes(x = jnd, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(subtitle = "JND Interaction") +
theme_minimal() +
facet_grid(sd_diff ~ .)
Let’s look at contrasts for the impact of the mean.
stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = means) %>%
ggplot(aes(x = jnd, y = condition)) +
stat_halfeyeh() +
labs(x = "JND Difference (Means present - absent)") +
theme_minimal() +
facet_grid(sd_diff ~ .)
Adding means seem to improve sensitivity for intervals at high variance. All other effects are not reliable.
Effect of means on JNDs for each combination of visualization condition and level of variance (in figure).
jnd_tbl <- stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = means) %>%
mean_qi()
jnd_p_tbl <- stats_df %>%
group_by(means, sd_diff, condition, .draw) %>% # maginalize out other manipulations
summarise(jnd_p_award = mean(jnd_p_award)) %>%
compare_levels(jnd_p_award, by = means) %>%
mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("means", "sd_diff", "condition"))
## # A tibble: 8 x 15
## # Groups: means, sd_diff [2]
## means sd_diff condition jnd .lower.x .upper.x .width.x .point.x
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 TRUE… 5 densities 0.0133 -0.0846 0.125 0.95 mean
## 2 TRUE… 5 intervals -0.0547 -0.216 0.116 0.95 mean
## 3 TRUE… 5 HOPs -0.0139 -0.158 0.139 0.95 mean
## 4 TRUE… 5 QDPs -0.0188 -0.108 0.0732 0.95 mean
## 5 TRUE… 15 densities -0.0409 -0.103 0.0196 0.95 mean
## 6 TRUE… 15 intervals -0.105 -0.169 -0.0486 0.95 mean
## 7 TRUE… 15 HOPs 0.0175 -0.0676 0.110 0.95 mean
## 8 TRUE… 15 QDPs -0.0320 -0.0828 0.0214 0.95 mean
## # … with 7 more variables: .interval.x <chr>, jnd_p_award <dbl>,
## # .lower.y <dbl>, .upper.y <dbl>, .width.y <dbl>, .point.y <chr>,
## # .interval.y <chr>
Effect of means on JNDs, marginalizing across visualization condition. The effect of adding means is not reliable at either low or high variance in the aggregate (omitted from figure for space).
jnd_tbl <- stats_df %>%
group_by(means, sd_diff, .draw) %>%
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = means) %>%
mean_qi()
jnd_p_tbl <- stats_df %>%
group_by(means, sd_diff, .draw) %>%
summarise(jnd_p_award = mean(jnd_p_award)) %>%
compare_levels(jnd_p_award, by = means) %>%
mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("means", "sd_diff"))
## # A tibble: 2 x 14
## # Groups: means [1]
## means sd_diff jnd .lower.x .upper.x .width.x .point.x .interval.x
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 TRUE… 5 -0.0185 -0.0958 0.0576 0.95 mean qi
## 2 TRUE… 15 -0.0402 -0.0843 0.00351 0.95 mean qi
## # … with 6 more variables: jnd_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## # .width.y <dbl>, .point.y <chr>, .interval.y <chr>
We preregistered comparisons of JNDs per visualization, marginalizing across other manipulations. However, it occurs to us in hindsight that this marginalization corresponds to a visualization designers cannot render, a chart both with and without means at the same time. Therefore, we omit these comparisons from the paper and present them only in supplemental materials.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
ggplot(aes(x = jnd, y = condition, fill = condition)) +
stat_slabh(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
labs(subtitle = "JND Per Visualization Condition") +
theme_minimal() +
theme(legend.position = "none")
stats_df %>%
group_by(condition, .draw) %>%
summarise(
jnd = mean(jnd),
jnd_p_award = mean(jnd_p_award)
) %>%
mean_qi()
## # A tibble: 4 x 10
## condition jnd jnd.lower jnd.upper jnd_p_award jnd_p_award.low…
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 densities 0.502 0.447 0.567 0.0631 0.0575
## 2 intervals 0.520 0.458 0.598 0.0635 0.0579
## 3 HOPs 0.599 0.520 0.696 0.0727 0.0653
## 4 QDPs 0.431 0.388 0.481 0.0555 0.0510
## # … with 4 more variables: jnd_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## # .interval <chr>
Let’s look at contrasts between visualization conditions for visual reliability tests.
stats_df %>%
group_by(condition, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = condition) %>%
ggplot(aes(x = jnd, y = condition)) +
stat_halfeyeh() +
labs(x = "Differences in JNDs Between Visualization Conditions") +
theme_minimal()
It looks like users are most sensitive to evidence (i.e., JNDs are smaller) in the quantile dotplots condition and are least sensitive with HOPs. Only the difference between quantile dotplots and other conditions is reliable.
In the paper, we look at the JNDs for different visualization designs when we average over variance to give a sense of overall effectiveness on this metric.
stats_df %>%
group_by(condition, means, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
ggplot(aes(x = jnd, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
theme_minimal() +
theme(legend.position = "none")
Let’s look at contrasts between each of these designs.
stats_df %>%
unite(design, c("condition", "means")) %>%
group_by(design, .draw) %>% # group by predictors to keep
summarise(jnd = mean(jnd)) %>% # marginalize by taking a weighted average
compare_levels(jnd, by = design) %>%
ggplot(aes(x = jnd, y = design)) +
stat_halfeyeh() +
labs(subtitle = "Differences in JND Between Visualization Designs, Averaging Over Variance") +
theme_minimal()
We can see that quantile dotplots with or without means have reliably smaller JNDs than other conditions, with the exception of the contrast between quantile dotplots without means and densities with or without means. These are the only reliably differences between designs.
Effect of means on JNDs for each visualization condition, marginalizing over variance (in figure).
jnd_tbl <- stats_df %>%
group_by(condition, means, .draw) %>% # maginalize out other manipulations
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = means) %>%
mean_qi()
jnd_p_tbl <- stats_df %>%
group_by(condition, means, .draw) %>% # maginalize out other manipulations
summarise(jnd_p_award = mean(jnd_p_award)) %>%
compare_levels(jnd_p_award, by = means) %>%
mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("condition", "means"))
## # A tibble: 4 x 14
## # Groups: condition [4]
## condition means jnd .lower.x .upper.x .width.x .point.x .interval.x
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 densities TRUE… -0.0138 -0.0802 0.0594 0.95 mean qi
## 2 intervals TRUE… -0.0800 -0.177 0.0141 0.95 mean qi
## 3 HOPs TRUE… 0.00180 -0.0921 0.104 0.95 mean qi
## 4 QDPs TRUE… -0.0254 -0.0825 0.0336 0.95 mean qi
## # … with 6 more variables: jnd_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## # .width.y <dbl>, .point.y <chr>, .interval.y <chr>
JNDs with and without means, marginalizing over variance. These numbers help us to contextualize the overall effect of visualization designs on JNDs.
stats_df %>%
group_by(condition, means, .draw) %>% # maginalize out other manipulations
summarise(
jnd = mean(jnd),
jnd_p_award = mean(jnd_p_award)
) %>%
mean_qi()
## # A tibble: 8 x 11
## # Groups: condition [4]
## condition means jnd jnd.lower jnd.upper jnd_p_award jnd_p_award.low…
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 densities FALSE 0.509 0.453 0.576 0.0641 0.0583
## 2 densities TRUE 0.495 0.430 0.578 0.0620 0.0556
## 3 intervals FALSE 0.559 0.486 0.654 0.0679 0.0612
## 4 intervals TRUE 0.480 0.410 0.575 0.0592 0.0528
## 5 HOPs FALSE 0.598 0.517 0.700 0.0727 0.0650
## 6 HOPs TRUE 0.600 0.505 0.718 0.0727 0.0637
## 7 QDPs FALSE 0.443 0.395 0.500 0.0571 0.0519
## 8 QDPs TRUE 0.418 0.366 0.481 0.0540 0.0484
## # … with 4 more variables: jnd_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## # .interval <chr>
Now, let’s look at constrasts for the impact of the level of variance. This is an exploratory comparison.
stats_df %>%
group_by(sd_diff, .draw) %>% # maginalize out other manipulations (including means present/absent and vis condition)
summarise(jnd = mean(jnd)) %>%
compare_levels(jnd, by = sd_diff) %>%
ggplot(aes(x = jnd, y = "effect of variance")) +
stat_halfeyeh() +
labs(x = "JND Difference (Variance high - low)") +
theme_minimal()
Users seem to be consistently more sensitive to evidence (smaller JNDs) when uncertainty is high. This might be because charts in the high uncertainty condition use more of the space on a chart to convey effect size compared to the low uncertainty charts which have a lot of white space such that smaller visual differences convey the same effect size.
We want to explore how perceptual bias as measured by LLO slopes impacts decision quality as measured by JND and PSE. To do this, we derive point estimates of estimates LLO slope, JND, and PSE for each worker in our data set and combine these statistics into one dataframe.
# get linear log odds (LLO) slopes per worker
wrkr_llo_slopes_df <- model_df %>%
group_by(worker_id, means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.p_sup, n = 500) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(llo_slope = .value) %>%
group_by(worker_id, condition) %>% # calculate point estimate of marginal LLO slope per worker
summarise(llo_slope = weighted.mean(llo_slope))
# get logistic regression slopes per worker
wrkr_logistic_slopes_df <- model_df %>%
group_by(worker_id, means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = c(0, 1)) %>%
add_fitted_draws(m.decisions, scale = "linear", n = 500, seed = 1234) %>%
compare_levels(.value, by = evidence) %>%
rename(slope = .value)
# get logistic regression intercepts per worker
wrkr_logistic_intercepts_df <- model_df %>%
group_by(worker_id ,means, sd_diff, condition, trial, start_means) %>%
data_grid(evidence = 0) %>%
add_fitted_draws(m.decisions, scale = "linear", n = 500, seed = 1234) %>%
rename(intercept = .value)
# join dataframes for logistic slopes and intercepts, calculate PSE and JND
wrkr_logistic_stats_df <- wrkr_logistic_slopes_df %>%
full_join(wrkr_logistic_intercepts_df, by = c("worker_id", "means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
mutate(
pse = -intercept / slope,
jnd = qlogis(0.75) / slope
) %>%
group_by(worker_id, condition) %>% # calculate point estimate of marginal JND and PSE per worker
summarise(
pse = weighted.mean(pse),
jnd = weighted.mean(jnd)
)
# join the dataframes of summary statistics per worker
wrkr_stats_df <- wrkr_llo_slopes_df %>%
full_join(wrkr_logistic_stats_df, by = c("worker_id", "condition"))
Prior work (Khaw et al., cited in paper) explained bias in PSE in terms of sensitivity to signal as measured by JND. Let’s plot these things together to see if we have a similar correspondence in our data.
wrkr_stats_df %>%
filter(jnd > 0) %>%
ggplot(aes(x = jnd, y = pse)) +
geom_point(alpha = 0.35) +
coord_cartesian(
xlim = c(0, 10),
ylim = c(-20, 20)
) +
theme_minimal()
We see here that PSE closer to zero are predicted by JNDs closer to zero. That is workers who have greater sensitivity to effect size for the purpose of decision-making also tend to make more utility-optimal decisions.
However, since we have a separate task which gauges bias in the perception of effect size, we can also look at how performance on the estimation task predicts performance on the decision task.
Now let’s look at the relationship between LLO slopes and JNDs. This should give a rough indication of how much perceptual accuracy for effect size judgments translates into sensitivity to effect size information for the purpose of decision-making. We’ve had to filter some workers with extreme JNDs out of this view to get a chart we can read. These are the subset of workers with JND estimates in a reasonable range.
wrkr_stats_df %>%
filter(jnd > 0) %>%
ggplot(aes(x = llo_slope, y = jnd)) +
geom_point(alpha = 0.35) +
coord_cartesian(ylim = c(0, 10)) +
theme_minimal()
We can see that while more of the high JNDs (indicating insensitivity) are for workers with low LLO slopes (indicating a tendency to underestimate effect size). However, most workers have relatively small JNDs across the full range of observed LLO slopes, suggesting that perceptual accuracy and sensitivity are only loosely linked with additional factors probably impacting decision-making.
What about the relationship between LLO slopes and PSE. This should give a rough sense of how much perceptual bias translates into bias in decision-making. Again, we’ve had to filter some workers with extreme PSE out of this view to get a chart we can read.
wrkr_stats_df %>%
ggplot(aes(x = llo_slope, y = pse)) +
geom_point(alpha = 0.35) +
coord_cartesian(ylim = c(-20, 20)) +
theme_minimal()
Here again we see that the most extreme biases in decision-making (PSE far from 0) tend to correspond with the most extreme tendency to underestimate effect size (slopes less than 1). While biases in decision-making are less common among users with more accurate effect size judgments, the opposite is not the case: There are many users with poor perceptual accuracy who have close to utility optimal decisions. This suggests that perceptual accuracy does not determine a user’s ability to make a decision. The implication for the visualization community is that we need to seek a better understanding of how performance on these tasks is related.
Part of this mismatch between perceptual performance and decision-making performance may be explained by the fact that out magnitude estimation task was more difficult than the decision task. Some users struggled with the more granular response scale of probability of superiority in pilot testing. By comparison, a binary decision is rather straightforward. We also incentivized the decision task and not the magnitude estimation task. Although we told participants that the best way to maximize their bonus was to answer both questions to the best of their ability, some participants may have sped through the probability of superiority judgments and focused on the decision task. This might explain some of the mismatch between performance on the two tasks.